Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 4

Part 1 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 2 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 3 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 5 – Microservices: It’s not (only) the size that matters, it’s (also) how you use them
Part 6 – Service vs Components vs Microservices

Text update the 27th of June 2021

In part 3 we saw, that in order to ensure a higher degree of autonomy for our services, we need to avoid (synchronous) 2 way communication (RPC/REST/etc.) between services and instead use 1 way communication.

A higher level of autonomy goes hand in hand with a lower degree of coupling. The less coupling we have, the less we need to bother with contract and data versioning.
We also increase our services stability – failure in other services doesn’t directly affect our services ability to respond to stimuli.

But how can we get any work done, if we only use 1 way communication? How can we get any data back from other services this way?
Short answer is you can’t, but with well defined Service Boundaries you (in most cases) shouldn’t need to call other services directly from your service to get data back.

Service boundaries

What is a service boundary?
It’s basically a word that’s used to define the business data and functionality that a Service is responsible for. In Microservices: synchronous communication, data ownership and coupling we covered Service principles such as Boundaries and Autonomy in detail.
Boundaries determine what’s inside and outside of a Service. In part 2 we used the aggregate pattern to analyse which data belonged inside the Legal Entity service.
In the case of the Legal Entity service we realised that the association between Legal Entity and Addresses belonged together because LegalEntity and its associated Addresses were created, changed and deleted together. By replacing two services with one we gained full autonomy for the Legal Entity service whereby we could avoid the need for orchestration and handling all the error scenarios that can result of orchestrating data-changing calls between services (LegalEntity service and Address service).

In the case of the Legal Entity the issue of coupling was easily solved, but what happens when you have a more complex set of data and relationships between these data?
We could just pile all of that data into a single service and thereby avoid the problem of having data changes across process boundaries (i.e. between different services that are hosted in other OS processes or on different physical servers). The issue with this approach is that this quickly brings us into monolith territory. There’s nothing wrong with monoliths per se. Monoliths can be build using many the same design principles described here, e.g. as modules/components, that are bundled together and deployed as a single unit – where as microservices often are deployed individually (that’s at least one of the major qualities that people talk about in relation to microservices).

Blurry boundaries – the slippery slope of monoliths

One of the problems with monoliths is the risk of blurry boundaries. This is not a design trait of monoliths, but more an empirically proven end result that many monoliths experience, where modules bundled closely together, often in the same code base, have a tendency of slowly deteriorating. This often happens because its just too easy and, at least in the beginning, convenient to add coupling between modules, classes, tables.

The taste of a monolith feels good, especially at the beginning of a project, when problems are fewer and complexity is lower.
Monoliths also have a tendency to take on too many responsibilities in form of data and functionality/logic.

Monoliths does have some advantages, such as:

  • You can take advantage of locality
    • Perform in memory calls and avoid distributed transactions
    • Can perform Joins with other components SQL tables because they’re in the same DB
  • You can take advantage of development IDE’s and use features such as refactoring, code completion and code searching

The back side of this coin is the risk of higher coupling and lower cohesion. Monoliths tend to form a slippery slope where they slowly grow larger and larger as they take on more responsibilities.

image

Slowly our monoliths data model grow in size until it finally get confusing and messy due to lack of cohesion

This is what I like to refer to the slippery slope of monoliths:

image

The slippery slope of a monolith as complexity grows

With monoliths we also risk running into several disadvantages, such as:

  • They are hard to adapt to new technology – you often need to rewrite the entire monolith to use new frameworks/languages/technologies (or use complicated solutions such as OSGi)
  • Low Reusability
    • Functionality of a part cannot be reused alone
  • Slow Delivery train
    • Introducing a new feature often requires coordination with other features to deliver all of them at the same time
  • They grow and grow and grow in size and responsibilities
  • Coupling increases over time
  • Higher and higher maintenance cost over time
  • Starting the application often takes a long time
  • Testing the application often takes a long time
  • Monoliths force high requirements on mental capacity in order to keep the entire monolith in your head
  • Reliability
    • The failure of one thing can potentially bring the entire monolith down (e.g. due to OutOfMemoryException)

You can design monoliths with internal modules/services/components that have loose coupling and well defined boundaries, but from my 20 years of experience these are rare cases. Big ball of mud is usually the norm. As we will see later in the blog series there’s a way to meet in the middle, where we can combine the advantages of microservices and monoliths.

Integration as a bunch of webservices

From my experience many organisations approach SOA or Microservices by taking the path of bolting (web)services on top of existing monoliths. This can definitely makes sense as a path to getting a higher degree of reuse for old monoliths.
The problem with this is: most monoliths have evolved to contain many different business capabilities. This means that companies end up having multi-master systems, where many systems owns similar or the same business data and they have no real single source of truth.

Later in this blog series we will look into how we should approach integrating with existing Systems of Record, such as ERP, CMS’s or other 3rd party software where the boundaries are fixed and cannot be modified.
For now we will focus on what approach we can take if we’re on a path to breaking up old monoliths into to small parts, which could be implemented as microservices.

image

Monolithic integration by a bunch of webservices

If we just take the existing monolith and carve them up into small (micro)services, then we also need to deal with the internal coupling of the monolith. The internal coupling is typically the result of inheritance, direct method calls, SQL joins, etc.
If this is our approach to creating (Micro)services then we’ve just gone from bad to worse.

image

Monolith sliced up into microservices

All of this is the result of weak or blurry service boundaries. We have services that are needy and greedy with regards to data and functionality in other services. In my opinion this is opposite of loose coupling.

With this design we basically get a distributed monolith, which share all the disadvantages of a monolith combined with all the disadvantages of a distributed system (based primarily around 2 way synchronous communication and shared databases).

Defining Service boundaries

When building new services or carving out services from old monoliths, we need to spend time defining the boundaries of our new services, so we (slowly – for migration cases) can get away from using 2 way communication between our services, except when authority is more important than autonomy – but more about this in a later blog post.

High autonomy is not necessarily the best solution for all cases. There might be cases, such as some reads across many services, where using 2 way communication between services is more cost effective from a development point of view and where the lack of autonomy is something that the organisation can live with.

Example

In an old monolith supporting a Retail domain we might have collected all functionality and data covering functional areas such as Product Catalogue, Sales, Inventory, Shipping and Billing.
Each of these functional areas could also be called subdomains or business capabilities:

image

Functional areas in a retail domain

Retail is all about selling Products, so each of these functional areas, or subdomains, will in some way all involve the domain concept Product.

Subdomains might use the same name or they might use a different names for the concept of a Product.
Subdomains
are also interested in different data related to Products.

For example:

  • A Product will exists in the Product Catalogue together with e.g. the name, the description, pictures, etc.
  • In the Sales subdomain we create Orders for Products, so here we might have OrderLines which reference Products
  • In Inventory we’re e.g. interested in Stock Keeping Unit (SKU), Quantity On Hand (QOH) and Location code. In Inventory the name or the picture of the product may be irrelevant. If they need it, it would be to aid Inventory workers in doing their job; it would not be a necessity to handling inventory business logic.
    Note: We may or may not use the name Product here, sometime a Product will be called a Stock Item or a Back Ordered Item, etc. depending on the state of the item/product in relation to our inventory.
  • In the Pricing domain we will be interested in pricing strategies for our products. This may also include customer discounts depending on customer statuses (which might be maintained in a CRM monolith/services).
  • In Shipping domain we do not care about QOH, etc. Instead we are interested in size of a products packaging, the weight and perhaps the name of the shipping receipt.

These different perspectives on domain concepts in a domain is what is known as different Bounded Contexts in Domain Driven Design (DDD).

In a monolith it would be very easy to create a Product table with many attributes/associations and then have all the different subdomains just insert/update and join data as they see fit. The risk is that this Product domain model will become big and it will have many reasons to change (Single Responsibility Principle violated) due to the coupling and lack of cohesion.
You can’t easily change the Product table layout since so many depend on it. Splitting such a code base into services, databases and service contracts basically just removes the technical coupling – the fact that a service still needs data- and functionality from other services will decrease our new services autonomy to a level that may be unacceptable.

Defining Service Boundaries

We need a way to design our service boundaries so our services don’t need to talk to each other using 2 way communication in order to fetch data or invoke functionality.

We could start by building our services around functional areas, aka. bounded-context/business-capabilities, and use those as our boundaries.

  • This means that our service owns the data and functionality that belong to the given boundary.
    • This should remove most layered and temporal coupling as data and logic resides in the same service.
  • Other services aren’t allowed to own any data from other services.
  • There can only be one master of the data. With this guarantee in place we can trust our service to be the single source of truth with regards to all of its business data.

By doing this we should ensure that our service only needs to respond to changes if the business functions that it’s responsible for changes.
This is also known as the Single Responsibility Principle (SRP) for services. You can read a good discussion about this here and here.

Another example

Note: The example below is meant as the first step in the approach to building more loosely coupled services. Defining service boundaries is not easy and in the next couple of blog posts I will dig deeper into how we can define better aligned service boundaries than what we get from the rudimentary approach described here.

Let’s start with the Product Catalogue service. Here we will store our single source of truth in relation to the Product aggregate, such as:  name, id (remember an aggregate needs a unique id), pictures, description, etc.

In the Sales service we work with Orders, where an Order has multiple OrderLines, that reflect the products a customer has ordered through the web shop.
In Sales we don’t need the name of the product because we don’t have any business rules that relate to the name of the Product. If we had such a rule then chances are that the name of the Product would belong in the Sales service instead of in the Product Catalogue.

With this our boundaries and service models might look like this (simplified):

image

Simple service data models

The two service domain models above represent two very clean data models which has high degree of cohesion and low coupling. The only coupling between the two is that OrderLines reference Products by id (remember the rule from part 2 which says aggregate reference each other by ID).

The WebShop UI, which is the client of many services, is responsible for displaying the products for sales, their name, picture and the price the customer needs to pay for each product, etc.
When the customer has completed adding products to its Shopping basket, the WebShop will send a command message to the Sales services which contains the quantity and product id’s for all the products the customer wants to buy. In a later blog post we will look at how we can take advantage of composite UI’s to ensure a low degree of coupling in the WebShop, but for now let us just assume that the WebShop is the client which communicates with each of the services using 2 way communication.

As long as the Sales Service is provided with the quantity and Product id then it can create Orders and add OrderLines without needing to talk to the Product catalogue service.
Note: here we assume that the Sales service has data related to Product Unit-Price and add’s this information to the OrderLine.

Confirmation email

But what happens when the Sales service wants to send the customer an Order confirmation email?
A customer that receives an Order confirmation, is naturally interested in seeing more than just prices, quantities and Product id’s. At least the customer wants to know the name of the products ordered.

How should the Sales service get hold of the Product’s name from the Product catalogue while preparing the Order Confirmation email?
Let’s look at some options the Sales service has available:

  • The most common approach: The Sales service could use 2 way communication to call the Product Catalogue Service for each OrderLine in the Order (either as a call for each OrderLine or a batch call that collects information for all Products referenced among the OrderLines)
    • This means that the Sales service now has a stronger contractual and temporal coupling to the Product Catalogue service. The Sales service knows which operations and what data the Product Catalogue service offers.
    • This means that whenever the Product Catalogue service changes, in a non backwards compatible way, then the Sales service also needs to change, even if the Sales Service didn’t care about the change. Alternatively the Product Catalogue service needs to version its contracts.
    • This problem can be resolved somewhat if the Product Catalogue service offered consumer driven contracts, where client of the service, e.g. the Sales service, determines what their contracts should look like. This adds more work to the team that runs the Product Catalogue service.
    • If the Product Catalogue service is down, then the Sales service can’t create Order Confirmations due to the temporal coupling. This might not be a big issue since Order Confirmations aren’t time critical or directly exposed to a customer that’s waiting for UI feedback.
  • Added 28th of February 2015: There are other approaches to SOA that are different from the Autonomous Service approach I’m describing here. One other approach worth mentioning views services as not being autonomous nor owning any business data, instead Services in this approach expose intentional interfaces and are responsible for coordinating interaction between different Systems of Record (SoR). Using this approach, as I understand it, the Product Catalogue (service) and Sales (service) would instead be classified as SoR’s, and still be autonomous. Instead a new coordinating Service that owns the “Send Order Confirmation Email” use case will be introduced. This service will call both the Product Catalogue SoR and the Sales SoR and fetch Order information and Product information to complete the Order Confirmation composition. The service’s operation might still be triggered by an event. This seems very similar to the concept of IT-Ops that Udi Dahan talks about here
  • The Product Catalogue service UI is mashed into the Order Confirmation process.
    • This is a more subtle and much weaker form of coupling because the Sales Service doesn’t need to know any of the data inside nor the contract for the Product catalogue service except for a very small shared rendering context defined by the UI (typically only the id of the Product).
    • Service mashup still involves temporal coupling between our services
    • I will get back to Composite UI/Service mashup in a later blog post
  • The final option would be that the Sales Service contains cached/duplicated data from the Product Catalogue. This could be accomplished, without incurring temporal coupling to the Product catalogue service, by using Data duplicate over Events

Data duplication over Events

When Products are added, changed or removed from the Product Catalogue we can notify other services of this fact using Business Events.
In this case the Product Catalogue service is so simple that the business events would resemble Create/Update/Delete (CUD) Events: ProductAdded, ProductUpdated and ProductDeleted.
Notice that the events are named in the past tense, which is an important fact about events. They have occurred and they represent a fact.

We could make the Sales Service listen for these events over a Message Channel (e.g. Publish/Subscribe style) and allow the Sales Service to build up its own internal presentation of Products with the data it’s interested in:

imageData duplication over events

This will result in the following Service data models:

image

Service models with events and data duplication

By using data duplication in this way we have gained the following advantages:

  • There’s still a clear owner of the data, product catalogue is the owner of the data and will notify subscribing services when data is changed.
    • This form of data caching technique is better than most traditional caching mechanisms, where you typically lack any notification or indication from the owner of the data, that the cached data is invalid. With events you’re notified as soon as the data is changed
  • The contractual coupling is lower. You’re only bound to event contracts that only contain data. The event contract is therefore much simpler than classical WSDL service contracts that have both data and functions. Experience shows that Event contract tend to change less often than normal functionality hungry contracts. Still the event contracts have to be designed with forward compatibility in mind, so it’s possible to add new non-mandatory fields without causing existing subscribers to fail.
  • The degree of coupling between the Product Catalogue service and Sales services is lower.
    • The Sales service only needs to know the event contracts and the message channel
    • The Product Catalogue service doesn’t have any coupling on the Sales service. It doesn’t know what the Sales service intends to do with the events it receives – in fact the Product Catalogue Service doesn’t even know which services gets its events.
  • You’ve broken the temporal coupling and technical coupling at the expense of being eventual consistent.
    • This follows nicely with the learnings from Pat Hellands “Life Beyond Distributed Transactions – An Apostate’s Opinion” (original article from 2007) / “Life Beyond Distributed Transactions – An Apostate’s Opinion” (updated and abbreviated version from 2016). In this paper he concludes that you can only be consistent within a single Aggregate instance (i.e. within a single transaction and therefore within a single service) whereas you have to be eventual consistent between aggregate instances (i.e. between services and also between individual transactions inside a single service) because we have no way of ensuring consistency between them unless we’re ready to pay a very high price and use distributed transactions.
    • In this case eventual consistency means that if the Message channel is unavailable or unable to deliver messages to the Sales Service, then we might be writing out product names in the Order Confirmation even though they have changed. As soon as the message channel is back up, the Sales service will catch up with the Product Catalogue service. Being eventual consistent is actually the norm when you use caching, whether you use Events or not.
    • We can make the eventual consistency problem smaller by anchoring events to time. This can be done using the event name and data. In the data you could inform the recipient how long into the future the values are valid and therefore cacheable (e.g. prices might only change once a day, product names rarely change, etc.)

Sceptics might look at Data duplication over Events and say that it looks like a lot of work for something that could be achieved by existing database technologies. And if that’s all you do use events for, then they’re not entirely wrong. Using Data duplication over Events is also not without its complexities, such as monitoring, channel setup, I/O overhead, added memory and storage footprint for the service(s) that duplicate the data.
Using Data duplication over Events is a well known, technology neutral, pattern for slowly separating monoliths into autonomous services, but it’s not the final solution for event based integration.
We can go even further and reap more benefits. We can use Events to drive Business processes.

Using business events to drive business processes across services

If we elevate events from CUD (Create/Update/Delete) events to become real business events that reflect the state changes (or facts) in our aggregates, then we can use these events to drive business processes through choreography. This is a purely event driven method of coordinating a business process, as opposed to using a (centralised) coordinator, also known as orchestrator, to coordinate our business processes (typically) using 2 way communication.

Let’s look at how we could drive the Order fulfilment process using Events. From the Webshop the customer presses the Accept Order button. When the user presses the Accept Order button, this triggers an AcceptOrder command message to be sent to the Sales Service:

image

The OrderAccepted event triggers the Order fulfilment process

The AcceptOrder command results in a state change in the Order aggragate instance, which as a result transitions into the state Accepted.
This state change (or fact) is communicated to all interested service as an OrderAccepted event – we’re stating the fact that the Order has been Accepted, which is irreversible (we can compensate, but we can not rollback this change after the transaction has been committed).
The Sales service doesn’t know who’s interested in the event, but at process level we have rehearsed our Order fulfilment process and agreed which services should react to the OrderAccepted event.
This reactive architecture style is known as Event Driven Architecture (EDA).

With EDA interaction style, the services them selves determine what to do when an Event occurs. For scenarios where we need to coordinate multiple services, e.g. to make sure we don’t perform any shipping until the customer has been billed and all items are in stock (or what ever the criteria for shipping might be) we will introduce a new Aggregate that will be responsible for the Order fulfilment business process and capability. Whether this new “process” aggregate belongs within the Shipping service or if it’s a standalone service (as depicted below), is not so important right now. The important thing is that we have identified a central business capability that we explicitly can assign the responsibility to.

Such a process aggregate can be implemented/supported by a Process Manager or a Saga (as it’s called in Rebus and NServiceBus). The process manager can choose to instruct other services on what to do (i.e. partial orchestration) if it needs to, but in general a lot can be handled using events alone.
We will in a later blog post get into when to favour other message types, such Command messages or Documents, over Event messages.

In the example below the Order fulfilment service awaits two events, OrderAccepted and CustomerBilled, before it publishes the OrderReadyForShipping event (in this case we could also have sent a ShipOrder command to the Shipping Service, but let’s stick with events for now).
To coordinate two events it requires that they contain enough information to to reveal that they’re related to the same Order fulfilment process instance. Here I will anchor events to the Order they relate to, which would mean that each Event would contain the Id of the Order the event relates to.

Process coordination using events is known as Choreography and it should be seen as a supplement to the more traditional (BPEL inspired) Orchestration approach, where the orchestrator instructs services about what to do.
Any real life solution will be using a combination of both approaches.

image

The choreographed order fulfilment process

There’s much more to say about Event Driven Architecture, Service boundary definition but this blog post is already long enough, so that will have to wait until next time.

45 thoughts on “Microservices: It’s not (only) the size that matters, it’s (also) how you use them – part 4

    1. Explicit coordination IS in the picture in the form of the OrderFulfilment Service, which acts as coordinator/process-manager/saga.
      It is its own service, and thereby external in relation to the other services, because it owns the Order-fulfilment process.

      I think we’re talking about the same thing, unless you by external and explicit say that it can ONLY live in a centralised ESB as a BPEL/BPMN implemented process. IMO that would be A (but not THE) way of physically implementing a process coordination.

      Like

      1. Ooops. Sorry, blind spot. Yes, a coordination service is externalised in the last illustration. I think, we are talking about the same things which are viewed from different perspectives. If all other services are communicating ONLY with the coordination service then ESB is only provisioning the technical connectivity. Business (or Aggregation) connectivity among other services is provided by the coordination service thus making ESB less important in the last illustration. This is the reason for “explicit coordination is still missing”.

        You are right that various coordination techniques may be used, e.g. event-based and template-based. More coordination techniques are in http://improving-bpm-systems.blogspot.ch/2014/03/coordination-techniques-in-bpm.html

        Maybe you find of interest my blogpost “BPM for developers” http://improving-bpm-systems.blogspot.ch/2013/04/bpm-for-developers-improve-agility-of.html

        Thanks,
        AS

        Like

  1. Hi nice post again. I look forward to seeing more of your documents over events idea.
    I was wondering about the name used for the final event. Why is this Order Ready For Shipment. In all the other events you are not using the name of the listening service. E.g. OrderAccepted which ends up with the Billing Service. The last event seems to not to stick to this convention. Is a name like Order fullfilled or OrderCompleted not a better name? A customer-loyalty service could be interested in such a event to elevate the customers level.

    Like

    1. Excellent question Pascal. This is something I want to address in my next blog post 🙂
      You’re right the OrderReadyForShipment event is questionable (which I also hinted at when I wrote that it could be replaced by a command message). Its sole purpose is to drive the next phase of the process, which gives it an artificial taste.
      What the event names and the process look like comes down to how the company defines the Sales, Order fulfilment, After Sales processes and how they’re linked and supported by software.
      Should the event instead be called OrderFulfilled or OrderCompleted?
      IMO this raises questions about when the company determines that an Order is fulfilled. Is that when we’ve shipped all the products to the customer or is it before?
      When is an order completed? Is that when the customers items has been shipped, when he’s received them in good shape, when there has been no products returned within the “money back guarantee” period, when the product guarantee runs out and the customer can’t complain about malfunctions.

      In my opinion it’s the processes and how they’re linked and supported that differentiates companies and gives some a competitive advantage over others. It’s not having the best (or same) ERP, the flashiest webshop, the best CRM system, etc. that alone determines if a company will win over its competitors. It’s the synergy that comes from having good and flexible processes that are supported/automated by software.

      Since it’s the business processes that are the key differentiator, it will be an area of constant change and that’s why we need to to take care when we design our processes and services. If they are too rigid and coupled the company will suffer every time we need to change the process or update a service to do a better or faster job.

      Like

  2. How would you implement an authentication using this approach? Let’s say the composite UI needs to make authenticated requests to several services. The user aggregate is required to authenticate, and all services need to authenticate – it seems like it needs to be its own service (to avoid sharing of the user data) – but then how to authenticate without using 2-way synchronous communication. The data replication/caching trick doesn’t work here because eventual consistency isn’t acceptable for authentication.

    Like

      1. Hi Peter and Sean

        You’re both on the right track. Authentication is IMO indeed a security service of its own. In DDD terms it will be Generic Subdomain, which logically belongs inside a service.
        This service will be responsible for authenticating the user. The authorisation part is a little more tricky. In general we can split authorisation in to two parts: a technical and a business part.
        This generic Security service can be responsible for which users are allowed to access which services (i.e. which endpoints can be called or which composites are shown on a given page) and eventually also which messages they’re allowed to send, i.e. purely technical authorisation.
        However, the security service should never be responsible for determining any business rules, like determining if this user is allowed to delete a customer, authorise a payment, see an account, etc. These are business- rules/authorisations which belong inside the service responsible for the given business capability. This also means that optimally user to role mapping is handled in side each service (and not in e.g. LDAP/AD). Putting all roles inside the LDAP/AD is a slippery slope where more and more logic will tend to creep inside.

        Left is how the different services interact with security service. A classic solution is to pass a security token around with any messages. In a Composite UI the Security service will be responsible for handling login (i.e. authentication). Since the security service is deployed (more about service deployment in a later blog post) together with the other services inside the application (server side and/or client side) they will be able to listen for local security events (e.g. in Javascript or in-process on the server side) and thereby encapsulate their own security handling after the security service has performed the initial authentication/authorisation. There should be no need for RPC.

        Like

      2. Hi Jeppe, thanks for the detailed answer. I’m not sure I quite understand how the application server verifies the auth token. You mentioned ‘local security events’, what does this mean exactly? If this will be answered in a future post feel free to leave me hanging!

        Like

      3. Sorry for the late answer, I was hoping to have completed part 5 by now and hopefully have answered your question with it.
        I hope to have it done soon.

        Like

  3. Another option for the order confirmation email is to have a separate process that listens to Product and OrderAccepted events. This is not a service, but can be viewed as a composite “UI” that is purely responsible for sending the email. This has advantages over storing the information in sales; the sales service is simpler and services only ever have to be concerned with the data they own. Disadvantage is another moving part to manage and sync data with.

    Like

    1. Hi Jonty

      I agree. This is also what I intended to say with the statement “The Product Catalogue service UI is mashed into the Order Confirmation process”. Your solution description + pros/cons is very nice. Thanks 🙂

      Like

  4. Hi Jeppe,

    A really thought provoking blog series, I look forward to the next installment. I did have a question though – the data duplication by events makes a lot of sense but how would you initialize the cache to start with, or would you expect the cache would be persistent and have been in place since events about Products started? Maybe some batch job as part of starting the Sales Service would retrieve and populate a starting state for the cache?

    Thanks,
    Dan

    Like

    1. Hi Dan

      Thanks for the compliment 🙂
      With data duplication over events we will always have scenarios where a service needs to be populated with events from other services – e.g. to to build up or rebuild its cache. This can be if the cache is not persistent (as you write), if there’s a bug in the cache builder, if additional event data is needed in the cache (e.g. a new requirement) or if it’s a brand new service being introduced.

      There are many ways by which a service can offer its events to other in a bulk fashion. A simple solution is to offer REST endpoint offering an Atom feed which allows other services to read the events as a stream. The beauty of this solution is that the same endpoint can be used for Pull based Publish/Subscribe so you get a solution that can handle both scenarios. Since events are immutable (never changes) we can combine the REST/Atom solution with HTTP caching (e.g. using Varnish) and get a very performant solution. This approach is e.g. used with Greg Youngs EventStore product https://github.com/EventStore/EventStore/wiki/Getting-Started-HTTP#reading-from-a-stream

      Caching is a complex subject. How large is the data set, how many events are there in the data set, how often does the data change, etc. Depending in the size, change frequency, criticality of the data for your own service, you can chose different approaches like batch fetching, just in time fetching, etc. Nathan Marz has an interesting piece (albeit from a “single” system scenario) on how to combine a realtime layer and batch layer (which is often recalculated – to avoid having to deal with the CAP theorem and to fix human/programmer errors), which is not far from what has been described here (real time – handling events as they come in and batch is rebuilding our cache on a need to basis) – http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

      One way to avoid having to deal with caching is to use the Composite UI/Application pattern, which I will cover in the next blog post.

      /Jeppe

      Like

  5. Thanks for the inspiration:

    – #BPM for software architects – from monolith applications to explicit and executable #coordination of #microservices architecture http://improving-bpm-systems.blogspot.ch/2014/08/bpm-for-software-architects-from.html

    – #BPM for the #digital age – Shifting architecture focus from the thing to how the things change together http://improving-bpm-systems.blogspot.ch/2014/08/bpm-for-digital-age-shifting.html

    Thanks,
    AS

    Like

  6. Hi Jeppe.

    Absolutely love this series. Thanks so much for writing it. Where is the next blog you keep promising?! 🙂

    One question: in your final example, why would you choose to have a process manager, rather than just have the shipping service wait for an order accepted and customer billed event itself? I.e. Why put this business logic about “when to ship” outside the shipping service? Is it not core to the process of shipping?

    Cheers,

    Graham.

    Like

    1. Thanks 🙂
      A busy schedule is the honest reason for not having completed part 5 yet 🙂
      You’re right in the simple example above the Orderfulfillment service is unnecessary and the shipping service could just as well have listened for the two events. I was building up to a large scenario involving more services which required some one other than shipping to handle error scenarios (e.g. if some items are out of stock, should we ship parts of the order, etc.).
      /Jeppe

      Like

  7. HI Jeppe

    We’re now in the process of redesigning our backend implementation and your articles come just in time. They are very helpful and insightful, thank you!

    However we have a concerns regarding some aspects of this way of services implementation, and it would be great if you had some time to comment:

    1) Service scaling.
    What if you need to scale Billing service to multiple physical instances. This would be a problem since all instances of service will be subscribed to topic and they will receive all events. This means that they should somehow partition events.
    Actually this means that services become consumers inside “Billing service group” staying subscribes regarding to other services. I know that Kafka implements something like this (topic partitioning) but in general this may be not easy to implement. Also it seems that this partitioning strategy with topic imposes additional coupling between services.

    2) Some services data shouldn’t be cached locally by the other services (or that cache should be updated synchronously). For example take the authentication. I see that you’ve some example implementation in comments with local (in-process) authentication service. It’s not clear how this service will update it’s local cache. If it will be done through events you will have a situation where user is changed his password but token (in local cache) isn’t invalidated and service will authenticates successfully with old token. How you would deal with such situations?
    We’re thinking about implementation where all services live in DMZ and doesn’t care about authentication. And this is solely a purpose of REST facade to services.

    3) This is most important for me. You’re saying that two-way communication always imposes high coupling and IMO this is the key assumption in your articles. But there are techniques where you may loosen this type of coupling with commands when transport decides who implements a command. There would be no hard coupling between exactly two distinct services. You may replace service which implements a command or split it or create and adaptor when service and commands evolves.

    Sure you still have two types of coupling in this case:
    – event (message) contract;
    – temporal coupling.
    IMO these still the same in EDA. You Order fulfillment service still has a temporal coupling on “customer billed” event. It will not work until billing service is down.

    P.S. I also read your slide-share battle with JJ 🙂 But I don’t truly understand his points, may-be because I’m too far from BPEL.

    Like

    1. Hi Maga

      Let me see if I can answer some of your questions 🙂

      1) This is as you mention non trivial. With a normal Topic solution and clustered subscribers each will receive a copy of the message/event.
      Some use adapters to handle this problem, e.g. see https://genericjmsra.java.net/docs/topiccluster/loadbalance.html – IMO this is NOT a particularly good solution.
      Another option is to have only one active subscriber which takes the event and places it onto a queue which then has competing consumers (each of the clustered handlers). In this case you then need to think about failover for the subscriber, which probably needs to be custom implemented (e.g. using Zookeeper), so this solution also incurs additional complexity.
      I’m working on a different solution which involves the notion of logical subscribers, so the each clustered subscriber subscribes using the same logical name (e.g. PaymentReceiver) and then the topic load balances between the logical subscribers. This is also a custom (but reusable) implementation.

      2) The authentication service is the technical authority for authentication and (technical) authorisation. A service is a logical construction. The deployment vs. logical model for services doesn’t need to be 1 – 1. The Authentication service can be deployed in multiple places. The Authentication service can be composed of multiple smaller parts (some call them Autonomous components other will claim that these are in fact microservices). The Authentication service can have a backend part (autonomous component), which is deployed individually from the part that’s responsible for intercepting and authorising e.g. HTTP calls in the REST facade. Both the HTTP intercepting Autonomous component and the backend Autonomous component belong to the same service. How they distribute/share data between them is an internal matter (it could have through a shared database, a read database/in-memory-cache which is kept consistent through e.g. internal events CQRS style)

      3) I’m specifically talking about communication between services, e.g. not between a (UI) client and a service. The issue with 2 way communication between services is that it tends to create a strong behavioural and temporal coupling between the services, e.g. see http://bill-poole.blogspot.dk/2008/03/synchronous-requestreply-is-bad_12.html . The calling service has an expectancy on the other service to perform some kind of task and it will wait for the result. The reason why I discuss it so much because it seems to be the norm and it IMO tends to create many very chatty cross service calls and in my experience this just grows with time until there calls left and right between services and you’re lost in integration spaghetti – your mileage might very.
      I believe that it’s possible to come up with a service design where 2 way communication between services is more rarely needed (it wont go away – but if we can minimise it then this will result in less integration, less need for contract versioning) – I’m sure some disagree and that’s fine. Different situations and design preferences call for different solutions. Sometimes 2 way communication is what you need, but then you should try to use request/reply (i.e. asynchronous 2 way) to break the strong temporal coupling – e.g. see http://bill-poole.blogspot.dk/2008/03/asynchronous-requestreply.html
      JJ has a different approach, which I believe is quite suitable for many scenarios, but it’s not my primary way to view and build services.
      You mention Command communication. Commands are inherently oneway (no response), but that doesn’t mean that they can’t be delivered using 2 way communication (e.g. HTTP). I don’t know if this answers your question?

      You also mention “IMO these still the same in EDA. You Order fulfilment service still has a temporal coupling on “customer billed” event. It will not work until billing service is down.”. It’s true that it wont do any work until the billing service is up again. You could have designed the Orderfulfilment integration using BPEL and 2 way communication and the result would have been the same. Events doesn’t really bring much extra to the table here. IMO this is because Order fulfilment is a coordinating service and it cannot completely remain autonomous.

      The EDA part and the temporal decoupling is more between the Sales service and the rest of the services. The Sales service can still take in orders even though Billing Service/Orderfulfilment service/etc. are down. The event is used to decouple the Sales service from knowing about any of the other services. Different business consistency requirements would have resulted in a different service/process design, where the Sales service might would have needed to call other services. EDA is not magical cure for everything (not a silver bullet). I believe any SOA solution will used EDA, One way communication, Orchestration and 2 way communication (Request/Response or Request/Reply). The choice/balance between them should be determined by the requirements.

      I hope I’ve answered your questions ? 🙂

      /Jeppe

      Like

      1. Hi Jeppe!

        2) So actually in such an logical authentication service there is a little chance that you’ll have one-way communication. It’s just incapsulated inside a service client used in-proc.

        3) My use of command term is incorrect here, i mean reply/request and speaking about lowering coupling one possible way is async reply/request as you mentioned.

        I think that I got your ideas more or less.
        Thank you for the comments and i’m also waiting for the 5’th part 🙂

        Like

      2. 2) Yes – everything that falls under IT operations (like external communication, security, etc.) typically involves a great deal of in-proc request/response orchestration (basically operation calls).

        Like

      3. One more thought about scaling.

        > I’m working on a different solution which involves the notion of logical subscribers, so the each clustered subscriber subscribes using the same logical name (e.g. PaymentReceiver) and then the topic load balances between the logical subscribers.

        This would be especially hard if applied to orchestration part (Orderfulfilment service). Events can’t be routed randomly and should be partitioned by user ID. This means that event “Order Accepted” and “Customer Billed” for the specific user should be consumed by one specific instance of Orderfulfilment service which may have a state.

        Like

      4. I guess it depends greatly on whether state sticks to only certain Orderfullfilment service instances (e.g. due to sharding in a multi tenant scenario) and how to determine which instance to use.
        In simple cases where each instance caters to only one User/Tenant then the logical name could contain the user id – e.g. “PaymentReceiver-UserX”).
        If it’s more involved then we need to make the topic sharding aware (like in databases that support shard – we need some consistent way to determine which shard to use for a certain piece of data) OR perhaps simpler and better, figure out a different way of sharding (e.g. at the database level) so instances can handle events and state from any user. It really depends on your needs and reasons for sharding/having multiple service instances.

        /Jeppe

        Like

  8. Hi Jeppe,

    This is a really great and comprehensive series of SOA, DDD, Microservices under the hood. I must admit, I haven’t seen such a valuable information in 1 place. You have managed to fit much more than the above 3 concepts. After reading your blog I realized lots of things that didn’t understand.

    And small question. I have started to create a unified data model for a bunch of services. In simple terms, it is a Dictionary of all fields (thousands of them) and their datatypes in a big domain, (the idea much like in WSDL-document, but made in house) packed in 1 java library, and reused across all services. Plus a big legacy database having all these fields for persistency.

    The main idea was to:
    – Eliminate redundant transformations across services;
    – Easily change data model from a single place for all of them.

    I’m not sure whether it is called a canonical data model. Can you please comment, and what drawbacks of such an approach? Is it worth it? I understand that the question might be not simple due to many nuance, but anyway it would be great to hear your 2 cents.

    Looking forward to the new series. Keep it up!

    -Ivan

    Like

    1. Thanks for the comments 🙂

      An enterprise wide canonical model, or a shared kernel (in DDD terms), is often a huge and risky endeavour. There’s pros & cons, where one of the pros is that you have a central place to change them.
      One of the cons is that when you change them, all users of the types are forced to adhere and change at the same time, unless you start to version your contracts (which is a field by itself).
      So if you change a certain contract, then all projects/services/teams need to be able to adhere to the new contract(s) in the new jar all at the same time. This means that all services/projects/teams needs to align on specific delivery dates and not go over time (or everyone goes overtime).
      This requires a lot of planning and I have only seen it fail. In my experience there will always be someone that thinks they don’t need to change just because someone else has a need, or a political fight that requires teams to disagree and go their separate ways, etc.
      If your contracts are stable, then a canonical model can make sense. Otherwise I will recommend sticking to a canonical model for only value types/objects, which are stable and that everyone agrees upon (e.g. like id’s).

      Canonical models also tends to ignore bounded contexts (or business capabilities) and try to “force” everyone to agree on certain domain concepts, like e.g. a customer, account, etc. The results are often huge domain objects, which many optional fields/associations.
      A simple example is an Address. In a company I worked with they tried to unify the concept of an address into a reusable service which ended up having a huge domain model (which defined the canonical model for addresses).
      The challenge was that depending on their usecases, different services/applications would view an Address very differently. Many would view them as Value objects, basically describing where a customer, employee, etc. lived or worked. There addresses didn’t have a unique id, they just had a bunch of fields that combined described a specific Address.
      Other services/applications were more GIS oriented and would view an Address as an Entity, which has a unique identity, typically the GPS location. These addresses would typically be combined with a lot of different information, like physical structure, buildings, that were totally irrelevant for addresses that were linked to customers.
      The life cycle of these addresses were also very different. The rules were very different and as a result the Address services grew to be very confusing to use and maintain. In other words it tried to be something for everyone and ended up being nothing for most.

      I hope this gave some answers 🙂

      /Jeppe

      Like

  9. Hi Jeppe,
    First I wanted to say that this is a great series, I’m currently working on implementing a large scale system using the microservice architecture, and your posts have solved several doubts I had (and some I didn’t knew I had). I’d like to ask a couple of questions.

    1) If I have a large number of services (in the tens or more), what would be the best way for our clients to access them, should there be a sort of gateway/proxy/router that recieves all requests and responds with an HTTP 301 (Permanent redirect) so the client caches the new location for each resource and all subsequent request for it are correctly placed. Or should there be a discovery service that clients can query so they automaticaly set up the internal references. Or should our clients have hard coded all the needed url’s?

    2) This question was already asked but I wasn’t so clear on the answer. But how should services access data owned by other services if it searches it’s cache and it isn’t there? I understand that when data is added or modified, the owning service should queue a message informing all interested parties of the change, but what happens when say the Sales Service needs info for a product that isnt’ in cache, either because it hasn’t been propagated yet, the cache expired, or any other reason. Should Sales Service ask Product Catalog Service for the info on the product, or have an interface for read-only access to the product data just-in-case, or another way that I haven’t thought of?

    Thanks 🙂

    Like

    1. Hi Mauricio

      1) Hardcoded urls can work, but then it requires that endpoints don’t change very often. I prefer to use a discovery service to avoid having all clients need to go through a “centralized” gateway. The simplest discovery service is a DNS and a more advanced one could be built on ZooKeeper or e.g. Hazelcast.

      2) As far as possible, I prefer avoiding both caching service data inside other services (e.g. using Events) and also avoid having services call each other. This requires well defined service boundaries and sometimes a different approach to service composition. Each usecase is different and the trade-offs are also different (especially when you consider the services/applications already in place). Certain usescases can be solved using a composite UI (e.g. invoice creation) and other times usecase analysis can reveal that e.g. certain product information, are only used by the sales service. In such a case it will be more appropriate to make the sales service the owner the product data in question. If none of this is possible, then having a dedicated read service-operation tailored to the sales services is the next best option. The challenge then becomes, is the product data needed for online processing (e.g. where a customer is waiting on the other end) or is it more of after the fact (e.g. batch/event driven) processing where there isn’t a customer impatiently waiting. If it’s the first case, then I will suggest placing the read-service operation into a service/autonomous-component that can be colocated (i.e. on the same machine or even in the same process) with the sales service to avoid the remote communication penalty.

      I hope this answers your questions. I hope to cover more on composite ui and service composition in later blog posts, but my schedule has been more busy (> 100%) then expected when I promised part 5 soon.

      /Jeppe

      Like

      1. Jeppe, this has been a very informative and thought provoking series of posts. I wanted to thank you for the time you’ve taken to put this together and to say, as many other commenters have, that I’m looking forward to part 5!

        Like

  10. Hi Jeppe!
    Just finished 6 blog posts in a row and I have to say this is the best explanation of SOA, DDD, EDA and micro-services I have read so far. I’m glad I checked your blog after one of your talks on Vimeo.

    I was hoping if you could expand more and maybe point to some resources about how to properly cache the data from other services (Data duplication over Events).
    For example:
    1- Where do you cache the data? inside the service? dedicated service?
    2- Should the event have the data required to update the cache? or should I just issue a new “getAllProducts()” on the ProductCatalogue to renew the whole cache everytime?
    3- In what form do I keep the cache? in a form ready for consumption? raw data that is returned from the service? (the ProductCatalogue service may expose on a getAllProducts() method that return extra data that I’m not interested in)

    hope you find time for a reply 🙂
    btw I’m waiting for parts 5,6,7,… and beyond!!

    Like

    1. Hi Sherif – thanks for comments and sorry for the late reply 🙂

      1 – The service that subscribes to events from other services, in order to build a cache/replica, should store it internally in the best possible way (see point 2). The question is of course where in the service this responsibility should be placed. In part 5 dig deeper into this (hint it could be inside a microservice)

      2 – That depends 😉 very much on the frequency of change, i.e. how long you can cache the data, and how large the details for each change is (is it a delta event or a bulk event). If you can cache it for a long time (e.g. the name of a product) then I would let the event carry the name of the product. If it’s something that changes often and contains a lot of information (like a new departure table for a train for the next week), then I would let the event carry enough information so subscribers can determine if they want to fetch the details. Sometimes it’s just better to call the other service/system to fetch the data needed instead of potentially caching their entire data set (which can be huge). I will also suggest looking at composite UI’s (which I discuss in part 5) as a different way of having all the right data available at the UI (or in an integration scenario).

      3 – I would follow the guidelines of CQRS where I would have my cache/read models prepared to fit the use case they’re designed for. This means that every time a receive an event I may update multiple read models. Each read model will be tailored to a specific use case and only contain the information (throw out data not needed) needed in the best possible structure (e.g. completely normalized) and perhaps serviced from memory instead of from a db (depends on the size of data, usage patterns, etc.)

      Like

  11. This is the most important post of the series. My comments:

    1) It would be useful to many readers if you explain what a monolith is beforehand. I say that especially because much of the material written about MSA has used this term, and much of that material mixes the deployment unit with the runtime service element when talking about size and about monoliths. For example, imagine you design your solution using small REST services that are quite specialized and have high cohesion, well-defined functional contexts, and small memory footprint. Imagine you used Java to implement these services and you packaged them in a war file for deployment. Did you use microservices in this solution? No, you didn’t. According to what I’ve been reading about MSA, this solution is monolithic because of the deployment unit! Although the services are “micro” if you look at their contract design and implementation, they’re deployed within a monolith. Since your post focuses on the service design and not much is said about deployment units, it would be useful to quote Lewis and Martin, for example, who see the monolith as a single logical executable that is built and deployed together.

    2) Where you say “big bowls of mud”, I think you should say “big balls of mud”. (My good friend Joe Yoder would not forgive me if I didn’t point that out. :^))

    3) Where you say “Shipping would also not care about QOH”, I think you should say “On the other hand, Shipping would not care about QOH”.

    4) The statement “This is what is known as different Bounded Contexts in Domain Driven Design (DDD)” is unclear because the reader can’t tell what does “This” refer to.

    5) The text says “Moving this dependency to services and service contracts basically just removes the technical coupling – the data- and functionality-coupling is still there and our services won’t be autonomous.” Well, this series of posts talks about so many different kinds of coupling that it’s hard to make sense of the first sentence. In any case, the design shows improvement because the subdomains now are coupled to the Product service contract and not directly to the Product data table. The Product service has freedom to change the product data persistence. However, the autonomy of these subdomains (i.e., other services that now interact with the Product service) *decreases*. Service autonomy is not “binary”, so I can’t agree with a phrase like “our services won’t be autonomous”. I’d rather say something like “service autonomy will decrease perhaps to an unacceptable level”.

    6) Two comments about the bullet that says “The final example would be that the Sales Service contains cache/duplicate of the Product Catalogue’s data…” First is that you could say “final option” instead of “final example”. Second, this solution falls under the SOA design pattern called “service data replication”, described in the “SOA Design Patterns” book by Thomas Erl et al. In particular, I like the term data replication better than cache because in my background a cache usually implies that a cache miss would require a data fetch from the master data store. So, I would rather call Product within “Sales Services” a replica rather than a cache, but that’s just terminology and, as you know, YMMV…

    7) You say that event contracts tend to change less often than normal functionality hungry contracts. Another important point to make is that event data should use a format that supports compatible change. For example, if the event data is in XML and you add non-mandatory elements to the xml schema, that change doesn’t break the subscribers.

    8) You say that Product Catalogue service “doesn’t know what the Sales services intendes to do with the events it receives”. In fact, the Product Catalogue service doesn’t even know which services get the events it publishes.

    9) I think your description of “data duplication over events”, which is a specialization of “service data replication” lacks a discussion of impact or tradeoffs. The benefits are well-explained, but the text neglects to explicitly mention that the complexity of the solution increases signficantly: you need to implement the Product replication (cache), configure a message channel, configure Product Catalogue to act as a publisher/observable element. Also, the message channel is not fail proof–it requires monitoring and possibly additional exception handling mechanisms (e.g., processing messages in a DLQ, durable subscriptions). Also, the memory footprint of Sales Services (in the example) may increase if the Product cache is in memory–if not, there might be a performance overhead due to I/O. So, the benefits of increased autonomy and predictability come at cost and the architect should be aware of the tradeoffs to make the right decision in light of the requirements.

    10) You say “… to use 2 way communication to coordinate our business processes (which is also known as orchestration)”. Later on you say “This form of coordination between services using events is also known as Choreography…” The terms orchestration and choreography have each become overloaded in recent years. In this post there’s this association: orchestration = 2-way communication and choreography = event-based communication. I find this view too strict. Orchestration and choreography are pardigms for service composition that can both use synchronous and asynchronous communication. The main difference is that orchestration presumes a central orchestration element/engine/server/hub. In a recent book (Building Microservices), Sam Newman gives an example of orchestrated design that uses request/response calls and compares it with an alternative design that uses events. Newman’s example, just like yours, uses req/resp in the orchestration, and events in the choreography. But Newman doesn’t say that coordination using request/response is known as orchestration or coordination using events is known as choreography. In fact, here’s his description of these terms: “With orchestration, we rely on a central brain to guide and drive the process, much like the conductor in an orchestra. With choreography, we inform each part of the system of its job, and let it work out the details, like dancers all finding their way and reacting to others around them in a ballet.”.

    11) Where you say “… this triggers that an AcceptOrder command message is sent to the Sales Service”, it is unclear whether this message is sychronous or asynchronous. The diagram that follows doesn’t clarify that either because there’s no notation key/legend.

    12) A wrap-up comment. This post gives me the impression that *the* design goal is: design service boundaries so that functionality is not invoked using 2-way communication, ever. I think *a* design goal should be service autonomy. However, there are other design goals, there are other quality attribute requirements that the architect should consider. The job of the architect is to weigh the tradeoffs when making design decisions. For example, you may compromise latency to improve security. Each problem has different requirements. There isn’t a solution that fits all cases.

    Like

    1. Hi Paulo.

      Again thanks for your thorough comments:

      1) I’ve added a description that hopefully makes it a more clear what a monoliths is and how its deployment model differs from microservices.

      2) Corrected. Thanks 🙂

      3) Corrected 🙂

      4) I’ve changed “this” to make it more clear what I mean.

      5) Good point – I agree that the previously definitive statement about autonomy was too rigid.

      6) Corrected. I agree at about cache being a poor name, but for completeness I included it. Personally I prefer the name duplication for when data is “duplicated” between two services (where one is the authority on the data and the other is just keeping a copy). I tend to use replication for when data is replicated around between different microservices all belonging to the same service/bounded context (more of a technical thing). YMMV 🙂

      7) Added an explanation about this, as it IS important.

      8) Added this too. Thanks 🙂

      9) I added some details about the tradeoffs you mentioned.

      10) True – while re-reading the text one could get the impression that orchestration only was two-way communication. I added an explanation about the centralization of the coordination being the thing that differentiates orchestration and choreography. Thanks for pointing it out 🙂

      11) As mentioned in the text the design is all about getting the best possible Autonomy. I also mentioned that sometimes e.g. authority is more important than autonomy, in which case your design has to chose different patterns.

      Like

Leave a comment