Part 1 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 2 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 3 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 5 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 6 – Service vs Components vs Microservices
Text update the 27th of June 2021
In part 3 we saw, that in order to ensure a higher degree of autonomy for our services, we need to avoid (synchronous) 2 way communication (RPC/REST/etc.) between services and instead use 1 way communication.
A higher level of autonomy goes hand in hand with a lower degree of coupling. The less coupling we have, the less we need to bother with contract and data versioning.
We also increase our services stability – failure in other services doesn’t directly affect our services ability to respond to stimuli.
But how can we get any work done, if we only use 1 way communication? How can we get any data back from other services this way?
Short answer is you can’t, but with well defined Service Boundaries you (in most cases) shouldn’t need to call other services directly from your service to get data back.
What is a service boundary?
It’s basically a word that’s used to define the business data and functionality that a Service is responsible for. In Microservices: synchronous communication, data ownership and coupling we covered Service principles such as Boundaries and Autonomy in detail.
Boundaries determine what’s inside and outside of a Service. In part 2 we used the aggregate pattern to analyse which data belonged inside the Legal Entity service.
In the case of the Legal Entity service we realised that the association between Legal Entity and Addresses belonged together because LegalEntity and its associated Addresses were created, changed and deleted together. By replacing two services with one we gained full autonomy for the Legal Entity service whereby we could avoid the need for orchestration and handling all the error scenarios that can result of orchestrating data-changing calls between services (LegalEntity service and Address service).
In the case of the Legal Entity the issue of coupling was easily solved, but what happens when you have a more complex set of data and relationships between these data?
We could just pile all of that data into a single service and thereby avoid the problem of having data changes across process boundaries (i.e. between different services that are hosted in other OS processes or on different physical servers). The issue with this approach is that this quickly brings us into monolith territory. There’s nothing wrong with monoliths per se. Monoliths can be build using many the same design principles described here, e.g. as modules/components, that are bundled together and deployed as a single unit – where as microservices often are deployed individually (that’s at least one of the major qualities that people talk about in relation to microservices).
Blurry boundaries – the slippery slope of monoliths
One of the problems with monoliths is the risk of blurry boundaries. This is not a design trait of monoliths, but more an empirically proven end result that many monoliths experience, where modules bundled closely together, often in the same code base, have a tendency of slowly deteriorating. This often happens because its just too easy and, at least in the beginning, convenient to add coupling between modules, classes, tables.
The taste of a monolith feels good, especially at the beginning of a project, when problems are fewer and complexity is lower.
Monoliths also have a tendency to take on too many responsibilities in form of data and functionality/logic.
Monoliths does have some advantages, such as:
- You can take advantage of locality
- Perform in memory calls and avoid distributed transactions
- Can perform Joins with other components SQL tables because they’re in the same DB
- You can take advantage of development IDE’s and use features such as refactoring, code completion and code searching
The back side of this coin is the risk of higher coupling and lower cohesion. Monoliths tend to form a slippery slope where they slowly grow larger and larger as they take on more responsibilities.
Slowly our monoliths data model grow in size until it finally get confusing and messy due to lack of cohesion
This is what I like to refer to the slippery slope of monoliths:
The slippery slope of a monolith as complexity grows
With monoliths we also risk running into several disadvantages, such as:
- They are hard to adapt to new technology – you often need to rewrite the entire monolith to use new frameworks/languages/technologies (or use complicated solutions such as OSGi)
- Low Reusability
- Functionality of a part cannot be reused alone
- Slow Delivery train
- Introducing a new feature often requires coordination with other features to deliver all of them at the same time
- They grow and grow and grow in size and responsibilities
- Coupling increases over time
- Higher and higher maintenance cost over time
- Starting the application often takes a long time
- Testing the application often takes a long time
- Monoliths force high requirements on mental capacity in order to keep the entire monolith in your head
- The failure of one thing can potentially bring the entire monolith down (e.g. due to OutOfMemoryException)
You can design monoliths with internal modules/services/components that have loose coupling and well defined boundaries, but from my 20 years of experience these are rare cases. Big ball of mud is usually the norm. As we will see later in the blog series there’s a way to meet in the middle, where we can combine the advantages of microservices and monoliths.
Integration as a bunch of webservices
From my experience many organisations approach SOA or Microservices by taking the path of bolting (web)services on top of existing monoliths. This can definitely makes sense as a path to getting a higher degree of reuse for old monoliths.
The problem with this is: most monoliths have evolved to contain many different business capabilities. This means that companies end up having multi-master systems, where many systems owns similar or the same business data and they have no real single source of truth.
Later in this blog series we will look into how we should approach integrating with existing Systems of Record, such as ERP, CMS’s or other 3rd party software where the boundaries are fixed and cannot be modified.
For now we will focus on what approach we can take if we’re on a path to breaking up old monoliths into to small parts, which could be implemented as microservices.
Monolithic integration by a bunch of webservices
If we just take the existing monolith and carve them up into small (micro)services, then we also need to deal with the internal coupling of the monolith. The internal coupling is typically the result of inheritance, direct method calls, SQL joins, etc.
If this is our approach to creating (Micro)services then we’ve just gone from bad to worse.
Monolith sliced up into microservices
All of this is the result of weak or blurry service boundaries. We have services that are needy and greedy with regards to data and functionality in other services. In my opinion this is opposite of loose coupling.
With this design we basically get a distributed monolith, which share all the disadvantages of a monolith combined with all the disadvantages of a distributed system (based primarily around 2 way synchronous communication and shared databases).
Defining Service boundaries
When building new services or carving out services from old monoliths, we need to spend time defining the boundaries of our new services, so we (slowly – for migration cases) can get away from using 2 way communication between our services, except when authority is more important than autonomy – but more about this in a later blog post.
High autonomy is not necessarily the best solution for all cases. There might be cases, such as some reads across many services, where using 2 way communication between services is more cost effective from a development point of view and where the lack of autonomy is something that the organisation can live with.
In an old monolith supporting a Retail domain we might have collected all functionality and data covering functional areas such as Product Catalogue, Sales, Inventory, Shipping and Billing.
Each of these functional areas could also be called subdomains or business capabilities:
Functional areas in a retail domain
Retail is all about selling Products, so each of these functional areas, or subdomains, will in some way all involve the domain concept Product.
Subdomains might use the same name or they might use a different names for the concept of a Product.
Subdomains are also interested in different data related to Products.
- A Product will exists in the Product Catalogue together with e.g. the name, the description, pictures, etc.
- In the Sales subdomain we create Orders for Products, so here we might have OrderLines which reference Products
- In Inventory we’re e.g. interested in Stock Keeping Unit (SKU), Quantity On Hand (QOH) and Location code. In Inventory the name or the picture of the product may be irrelevant. If they need it, it would be to aid Inventory workers in doing their job; it would not be a necessity to handling inventory business logic.
Note: We may or may not use the name Product here, sometime a Product will be called a Stock Item or a Back Ordered Item, etc. depending on the state of the item/product in relation to our inventory.
- In the Pricing domain we will be interested in pricing strategies for our products. This may also include customer discounts depending on customer statuses (which might be maintained in a CRM monolith/services).
- In Shipping domain we do not care about QOH, etc. Instead we are interested in size of a products packaging, the weight and perhaps the name of the shipping receipt.
These different perspectives on domain concepts in a domain is what is known as different Bounded Contexts in Domain Driven Design (DDD).
In a monolith it would be very easy to create a Product table with many attributes/associations and then have all the different subdomains just insert/update and join data as they see fit. The risk is that this Product domain model will become big and it will have many reasons to change (Single Responsibility Principle violated) due to the coupling and lack of cohesion.
You can’t easily change the Product table layout since so many depend on it. Splitting such a code base into services, databases and service contracts basically just removes the technical coupling – the fact that a service still needs data- and functionality from other services will decrease our new services autonomy to a level that may be unacceptable.
Defining Service Boundaries
We need a way to design our service boundaries so our services don’t need to talk to each other using 2 way communication in order to fetch data or invoke functionality.
We could start by building our services around functional areas, aka. bounded-context/business-capabilities, and use those as our boundaries.
- This means that our service owns the data and functionality that belong to the given boundary.
- This should remove most layered and temporal coupling as data and logic resides in the same service.
- Other services aren’t allowed to own any data from other services.
- There can only be one master of the data. With this guarantee in place we can trust our service to be the single source of truth with regards to all of its business data.
By doing this we should ensure that our service only needs to respond to changes if the business functions that it’s responsible for changes.
This is also known as the Single Responsibility Principle (SRP) for services. You can read a good discussion about this here and here.
Note: The example below is meant as the first step in the approach to building more loosely coupled services. Defining service boundaries is not easy and in the next couple of blog posts I will dig deeper into how we can define better aligned service boundaries than what we get from the rudimentary approach described here.
Let’s start with the Product Catalogue service. Here we will store our single source of truth in relation to the Product aggregate, such as: name, id (remember an aggregate needs a unique id), pictures, description, etc.
In the Sales service we work with Orders, where an Order has multiple OrderLines, that reflect the products a customer has ordered through the web shop.
In Sales we don’t need the name of the product because we don’t have any business rules that relate to the name of the Product. If we had such a rule then chances are that the name of the Product would belong in the Sales service instead of in the Product Catalogue.
With this our boundaries and service models might look like this (simplified):
Simple service data models
The two service domain models above represent two very clean data models which has high degree of cohesion and low coupling. The only coupling between the two is that OrderLines reference Products by id (remember the rule from part 2 which says aggregate reference each other by ID).
The WebShop UI, which is the client of many services, is responsible for displaying the products for sales, their name, picture and the price the customer needs to pay for each product, etc.
When the customer has completed adding products to its Shopping basket, the WebShop will send a command message to the Sales services which contains the quantity and product id’s for all the products the customer wants to buy. In a later blog post we will look at how we can take advantage of composite UI’s to ensure a low degree of coupling in the WebShop, but for now let us just assume that the WebShop is the client which communicates with each of the services using 2 way communication.
As long as the Sales Service is provided with the quantity and Product id then it can create Orders and add OrderLines without needing to talk to the Product catalogue service.
Note: here we assume that the Sales service has data related to Product Unit-Price and add’s this information to the OrderLine.
But what happens when the Sales service wants to send the customer an Order confirmation email?
A customer that receives an Order confirmation, is naturally interested in seeing more than just prices, quantities and Product id’s. At least the customer wants to know the name of the products ordered.
How should the Sales service get hold of the Product’s name from the Product catalogue while preparing the Order Confirmation email?
Let’s look at some options the Sales service has available:
- The most common approach: The Sales service could use 2 way communication to call the Product Catalogue Service for each OrderLine in the Order (either as a call for each OrderLine or a batch call that collects information for all Products referenced among the OrderLines)
- This means that the Sales service now has a stronger contractual and temporal coupling to the Product Catalogue service. The Sales service knows which operations and what data the Product Catalogue service offers.
- This means that whenever the Product Catalogue service changes, in a non backwards compatible way, then the Sales service also needs to change, even if the Sales Service didn’t care about the change. Alternatively the Product Catalogue service needs to version its contracts.
- This problem can be resolved somewhat if the Product Catalogue service offered consumer driven contracts, where client of the service, e.g. the Sales service, determines what their contracts should look like. This adds more work to the team that runs the Product Catalogue service.
- If the Product Catalogue service is down, then the Sales service can’t create Order Confirmations due to the temporal coupling. This might not be a big issue since Order Confirmations aren’t time critical or directly exposed to a customer that’s waiting for UI feedback.
- Added 28th of February 2015: There are other approaches to SOA that are different from the Autonomous Service approach I’m describing here. One other approach worth mentioning views services as not being autonomous nor owning any business data, instead Services in this approach expose intentional interfaces and are responsible for coordinating interaction between different Systems of Record (SoR). Using this approach, as I understand it, the Product Catalogue (service) and Sales (service) would instead be classified as SoR’s, and still be autonomous. Instead a new coordinating Service that owns the “Send Order Confirmation Email” use case will be introduced. This service will call both the Product Catalogue SoR and the Sales SoR and fetch Order information and Product information to complete the Order Confirmation composition. The service’s operation might still be triggered by an event. This seems very similar to the concept of IT-Ops that Udi Dahan talks about here
- The Product Catalogue service UI is mashed into the Order Confirmation process.
- This is a more subtle and much weaker form of coupling because the Sales Service doesn’t need to know any of the data inside nor the contract for the Product catalogue service except for a very small shared rendering context defined by the UI (typically only the id of the Product).
- Service mashup still involves temporal coupling between our services
- I will get back to Composite UI/Service mashup in a later blog post
- The final option would be that the Sales Service contains cached/duplicated data from the Product Catalogue. This could be accomplished, without incurring temporal coupling to the Product catalogue service, by using Data duplicate over Events
Data duplication over Events
When Products are added, changed or removed from the Product Catalogue we can notify other services of this fact using Business Events.
In this case the Product Catalogue service is so simple that the business events would resemble Create/Update/Delete (CUD) Events: ProductAdded, ProductUpdated and ProductDeleted.
Notice that the events are named in the past tense, which is an important fact about events. They have occurred and they represent a fact.
We could make the Sales Service listen for these events over a Message Channel (e.g. Publish/Subscribe style) and allow the Sales Service to build up its own internal presentation of Products with the data it’s interested in:
Data duplication over events
This will result in the following Service data models:
Service models with events and data duplication
By using data duplication in this way we have gained the following advantages:
- There’s still a clear owner of the data, product catalogue is the owner of the data and will notify subscribing services when data is changed.
- This form of data caching technique is better than most traditional caching mechanisms, where you typically lack any notification or indication from the owner of the data, that the cached data is invalid. With events you’re notified as soon as the data is changed
- The contractual coupling is lower. You’re only bound to event contracts that only contain data. The event contract is therefore much simpler than classical WSDL service contracts that have both data and functions. Experience shows that Event contract tend to change less often than normal functionality hungry contracts. Still the event contracts have to be designed with forward compatibility in mind, so it’s possible to add new non-mandatory fields without causing existing subscribers to fail.
- The degree of coupling between the Product Catalogue service and Sales services is lower.
- The Sales service only needs to know the event contracts and the message channel
- The Product Catalogue service doesn’t have any coupling on the Sales service. It doesn’t know what the Sales service intends to do with the events it receives – in fact the Product Catalogue Service doesn’t even know which services gets its events.
- You’ve broken the temporal coupling and technical coupling at the expense of being eventual consistent.
- This follows nicely with the learnings from Pat Hellands “Life Beyond Distributed Transactions – An Apostate’s Opinion” (original article from 2007) / “Life Beyond Distributed Transactions – An Apostate’s Opinion” (updated and abbreviated version from 2016). In this paper he concludes that you can only be consistent within a single Aggregate instance (i.e. within a single transaction and therefore within a single service) whereas you have to be eventual consistent between aggregate instances (i.e. between services and also between individual transactions inside a single service) because we have no way of ensuring consistency between them unless we’re ready to pay a very high price and use distributed transactions.
- In this case eventual consistency means that if the Message channel is unavailable or unable to deliver messages to the Sales Service, then we might be writing out product names in the Order Confirmation even though they have changed. As soon as the message channel is back up, the Sales service will catch up with the Product Catalogue service. Being eventual consistent is actually the norm when you use caching, whether you use Events or not.
- We can make the eventual consistency problem smaller by anchoring events to time. This can be done using the event name and data. In the data you could inform the recipient how long into the future the values are valid and therefore cacheable (e.g. prices might only change once a day, product names rarely change, etc.)
Sceptics might look at Data duplication over Events and say that it looks like a lot of work for something that could be achieved by existing database technologies. And if that’s all you do use events for, then they’re not entirely wrong. Using Data duplication over Events is also not without its complexities, such as monitoring, channel setup, I/O overhead, added memory and storage footprint for the service(s) that duplicate the data.
Using Data duplication over Events is a well known, technology neutral, pattern for slowly separating monoliths into autonomous services, but it’s not the final solution for event based integration.
We can go even further and reap more benefits. We can use Events to drive Business processes.
Using business events to drive business processes across services
If we elevate events from CUD (Create/Update/Delete) events to become real business events that reflect the state changes (or facts) in our aggregates, then we can use these events to drive business processes through choreography. This is a purely event driven method of coordinating a business process, as opposed to using a (centralised) coordinator, also known as orchestrator, to coordinate our business processes (typically) using 2 way communication.
Let’s look at how we could drive the Order fulfilment process using Events. From the Webshop the customer presses the Accept Order button. When the user presses the Accept Order button, this triggers an AcceptOrder command message to be sent to the Sales Service:
The OrderAccepted event triggers the Order fulfilment process
The AcceptOrder command results in a state change in the Order aggragate instance, which as a result transitions into the state Accepted.
This state change (or fact) is communicated to all interested service as an OrderAccepted event – we’re stating the fact that the Order has been Accepted, which is irreversible (we can compensate, but we can not rollback this change after the transaction has been committed).
The Sales service doesn’t know who’s interested in the event, but at process level we have rehearsed our Order fulfilment process and agreed which services should react to the OrderAccepted event.
This reactive architecture style is known as Event Driven Architecture (EDA).
With EDA interaction style, the services them selves determine what to do when an Event occurs. For scenarios where we need to coordinate multiple services, e.g. to make sure we don’t perform any shipping until the customer has been billed and all items are in stock (or what ever the criteria for shipping might be) we will introduce a new Aggregate that will be responsible for the Order fulfilment business process and capability. Whether this new “process” aggregate belongs within the Shipping service or if it’s a standalone service (as depicted below), is not so important right now. The important thing is that we have identified a central business capability that we explicitly can assign the responsibility to.
Such a process aggregate can be implemented/supported by a Process Manager or a Saga (as it’s called in Rebus and NServiceBus). The process manager can choose to instruct other services on what to do (i.e. partial orchestration) if it needs to, but in general a lot can be handled using events alone.
We will in a later blog post get into when to favour other message types, such Command messages or Documents, over Event messages.
In the example below the Order fulfilment service awaits two events, OrderAccepted and CustomerBilled, before it publishes the OrderReadyForShipping event (in this case we could also have sent a ShipOrder command to the Shipping Service, but let’s stick with events for now).
To coordinate two events it requires that they contain enough information to to reveal that they’re related to the same Order fulfilment process instance. Here I will anchor events to the Order they relate to, which would mean that each Event would contain the Id of the Order the event relates to.
Process coordination using events is known as Choreography and it should be seen as a supplement to the more traditional (BPEL inspired) Orchestration approach, where the orchestrator instructs services about what to do.
Any real life solution will be using a combination of both approaches.
The choreographed order fulfilment process
There’s much more to say about Event Driven Architecture, Service boundary definition but this blog post is already long enough, so that will have to wait until next time.