Text updated on the 21st of June 2021
The other day I read that the new Proask system for the National Board of Industrial Injuries in Denmark, was the first major project that would realize the Ministry of Employment strategic decision to use a Service Oriented Architecture (SOA). For those who have not heard of Proask, it is yet another strongly delayed public project which, like most other public projects, are trying to solve a very big problem in a large chunk. A lot can be written about this approach, but in this blog post I will focus on here is their approach to SOA. A related article reports that the new Proask system is 5 times slower than their old system from 1991.
The Proask project was initiated in 2008. It made me think back on other ( private) SOA prestige project from the same period, for which I was the architect for a subcontractor. The entire project was built around SOA with many subsystems that would deliver services. The entire architecture was built around an ESB that would act as facilitator in terms of mapping and coordination. All communication was done as synchronous WebService calls over HTTP(S). So classic SOA for the period 2003-201? (sadly synchronous calls are still the predominant integration form today). This SOA realization was also characterized by very poor performance, high latency and low stability.
But why ? The reason lies in the way they had chosen to split up the services and not least that they had chosen to communicate synchronously between the different services and made use of layered SOA approach.
So what’s wrong with synchronous integration?
As such, nothing. It depends on how and when it is being used. If we integrate with older applications, there is often no other options than to use the synchronous API that is available, if a public API exists at all. This kind of integration is already happening on the trailing edge, meaning that API, data ownership and functionality of the service/application we are integrating with is already carved in stone and hard if not impossible to change. It is also important to realize that such an integration falls under the category “Integration by a bunch of Web services” or at best “Enterprise Application Integration (EAI)” over WebServices / REST / etc. Synchronous integration between services IMO has nothing to do with SOA since it breaks several of the basic SOA principles.
Don Box, the man behind SOAP, suggested the 4 Tenets of service orientation:
- Boundaries are explicit
- Services are autonomous
- Services share schema and contract, not class
- Service compatibility is based on policy
Principles 1 and 2 are easily overlooked and the focus has mostly been on Principle 3 and 4.
For me, Principles 1 and 2 are very important because they will help guide us when designing services and assign responsibilities to them. They also help to us answer why synchronous calls between Services should be avoided.
First SOA Principle: Boundaries are explicit
An explicit service boundary can mean different things to many people.
What I am emphasizing on is: A services responsibilities with respect to both data and functionality is clear and has cohesion.
With cohesion I’m referring to level following types:
- Communicational cohesion – meaning that logic and the data the logic is working on are both part of the same service
- Layered cohesion – meaning a service is responsible for everything: data persistence, handling business-related-security*, user interface, etc.
- Temporal cohesion – meaning the parts / aspects involved in the handling services functionality are grouped together so that they can be executed close to each other in time
Without communicational cohesion and layered cohesion, it is impossible to achieve Temporal cohesion which is a prerequisite for the Second SOA Principle “Services are autonomous.”
* Business related security – for example is this user allowed to perform this action like cancel this purchase, approve this transfer, etc.
Most of us have grown up with monolithic applications driven by databases. What I have observed is that this type of applications tends to end up with a large / thick data model where everything ends up been connected to each other, mostly because it is easy and convenient to create a join table, a union in a query , etc. by following the path of least resistance.
It starts usually simple but eventually end up easily with a big bowl of mud:
Slowly our data models grow in size until they finally get confusing and messy
One of the challenges of large data / domain models is that the same entities / tables ends up being used in many different contexts (use cases) .
Knowledge of what associations / columns / properties you can / may use lie hidden in the code and end up being implicit knowledge. Examples of this are the small rules people are told to remember: “Do not join these tables unless the customer is in arrears” or “the property x only has a valid value if property y is true.”
Another problem is that we in the name of normalization or reuse prefer to reuse the same entities / tables to represent two or more similarly entities in our domain, without questioning whether they in fact are the same concept or if there are different perspectives of the same concept, or if they are different concepts with conflicting names. Within each domain, for example online retail, there are many sub-domains that have their own needs and specialties. One of the reasons that we end up with large domain models is that we do not break up our models per sub-domain.
Sub domains with in the retail domain
Just because some of our use cases happens to involve an entity called “Product” does not mean that it is the same concept or they share same meaning of the word (“Product”).
Unfortunately most of us have only learned to look for nouns (entities) and reteofit verbs (functions) when we analyze use cases / stories / etc . There is no good guidance on how to figure out whether the nouns / entities we find really deal with the same concept or the same perspective on a concept. It is risky to automatically elevate common Entities to Sub-domains/services. Unfortunately this is what happens in most organizations – we have a Customer Service or Product service.
Where it becomes problematic is when we take data / perspectives on e.g. a Customer from other sub-domains (eg. Billing, Shipping, etc.) and mix into the Customer sub-domain/service, due to cognitive bias we quickly start justifying why another Entity or Value Object belongs to the same domain based on the name. The result of this bias is that the Customer sub-domain grows in complexity and quietly gets less and less data cohesion because it’s trying to be something for everyone and risk being everything for none, because it’s hard to satisfy everyone’s expectations without disappointing them or succumbing to the pressure of adding more data or logic even though it doesn’t really belong in the service. It’s cases like this that make our Service boundaries blurred.
When we get these centralized islands of data / functionality that have assumed responsibilities from other services, we quickly end up in a situation where it’s necessary for other services to call our service in order to acquire the necessary information it needs to perform its task.
And what is the easiest way to request information from other services? Some form of two way communication (eg. Synchronous WebService / REST calls over HTTP / GraphQL, etc)
Data service islands, synchronous communication and coupling
Conclusion: If our services have are well-defined boundaries, i.e. data and the logic we need is present within the service then we are well on our way to achieve communicational cohesion, layered cohesion and temporal cohesion and thus achieving temporally decoupling of our service from other services. The higher the cohesion we have within the service, the less we need to rely on other services for our own service to perform its work. The more a service needs to talk to other services, the more they start knowing about each other, which increases our coupling.
We can’t avoid essentail coupling, i.e. the natural coupling of the domain, but we must try to avoid accidental coupling introduced by poor service boundaries.
In a future blog post I will look into how we can analyze a domain and identify sub-domains and services.
Second SOA Principle: Services are autonomous
Autonomy means that our service is independent and self-contained and as far as possible doesn’t directly depend on other services to be functional.
97% of the SOA solutions I’ve seen mainly use synchronous service calls as their form of integration. In some contexts it is impossible to avoid synchronous integration, but in most contexts it is certainly possible to avoid them by making sure that we do not violate the first SOA Principle “Boundaries are explicit”.
Why is synchronous integration so problematic?
First of all synchronous request/response style integration causes us to be temporally, contractually and behaviourally coupled, because we depend on other services to be available for our service to work and we depend on them not to change their contracts in a way that breaks our implementation. There are many other challenges that make synchronous integration even worse. Let’s base our argumentation on the sequence diagram below and identify some of the problems that arise when integrating using synchronous function call (no matter if it’s Web services or REST):
Synchronous integration example
Here we have a Process Service “Create Sales Order Service” which in sequence calls the two Task Services “Create Order Service” and “Create Invoice Service”. These task services respectively each call two Entity / Data Services: “Order Service”, “Customer Service” and “Invoice Service”.
Problems identified with the example above:
- Latency time from the “Create Sales Order Service” is called and until both Task services has called their two Entity Services and can return an answer.
- Sometimes you can parallelize some of these calls and other times there is a forced sequence due to inter-service dependencies (eg. you can not use the Customer in the other Task Service before the Customer has been created by first Task Service)
- It is not uncommon to see examples where > 10 service calls are needed to complete one process service.
- If just a single of the underlying service is unavailable the ENTIRE process service (and all other services that may use the same underlying service or our service) is unavailable. So if Invoice service is down, we can not accept new orders .
- Most processes, such as in our example, can benefit from postponing some steps to a later stage/time. Eg. for most online retail systems it will be better to be able to receive an Order even though the Invoice service is down. If the invoice arraive minutes, hours or days later is rarely a major problem.
- If a single service calls from above takes a long time then the entire Process Service takes a long time (weakest link in the chain).
- If one or more of service calls from above update data and we experience Faults / Exceptions or other errors (eg. I/O error) we are faced with an inconsistent system, which require complex compensation logic. Below I have given an example of the challenges you will encounter in terms of compensation. The example is by no means complete. The real solution is much more complex and must take into account that System B can be shut down / crash during the compensation process – also known as resume functionality :
Transactional compensation with synchronous integration
The reason for synchronous and chatty services like above are often caused by violation of the First SOA Principle “Boundaries are explicit”.
Conclusion: When our services develop data / functionality envy and thereby need to talk to other services, then our services begin to know too much about each other which increases our coupling, latency and stability, because we depend on other services availability and contract stability. All this reduces our autonomy and effectively creates a house of cards that easily can be knocked over.
In order to achieve a higher level of service autonomy, we need to avoid integrating using 2 way synchronous service calls, and no it does not help to perform the calls with an async API, the temporal coupling between the services is still the same – our service can not continue before the other service has responded. 2 way communication in the form of request / reply or request / response communication is not the way forward, we need to look in another direction to find a solution to our decoupling and autonomy needs.
This will be the subject of the next blog post. Until then, I’m really interested in hearing your opinions and ideas 🙂