Orchestration vs choreography for microservice workflows

Broadly speaking there are two ways of implementing control flow logic in distributed systems. Orchestration involves establishing a centralised mediator service (the orchestrator) that coordinates interaction between services. Choreography employs a more decentralised approach, where individual services responsible for managing their own interactions.

These are often presented as opposing, mutually exclusive styles of collaboration, but that does not have to be the case. You can have both co-existing happily within the same architecture.

Collaboration and coupling

Choreography tends to be associated with event-driven architecture, where service collaboration happens through asynchronous messaging and pub\sub communication. This promotes an approach based on “smart endpoints and dumb pipes” where business logic resides in service implementations rather than the integration infrastructure.

Orchestrators on the other hand tend to be associated more with real-time, synchronous interfaces such as REST or gRPC. This style of request\response interaction tends to increase coupling as the orchestrator has to talk directly to every service. This in turn can undermine the scalability, resilience, and flexibility of an architecture.

Note that orchestration doesn’t have to lead to this kind of direct coupling. Using an orchestrator just means that you are delegating workflow logic to a centralised service. There’s no reason why you can’t still use more loosely coupled, event-driven architecture with a centralised orchestrator. Bu the same token, choreography can still be based on a dance of tightly coupled services communicating through synchronous interfaces.

In this context, the integration medium can be a red herring. The key issue is how you want to manage your workflow logic – delegate it to a centralised controller service or distribute it around your services?

Orchestrators are not for the feint hearted...

Orchestration has the potential to provide greater visibility over business processes, as coordination logic is held in a single place where it can be easier to reason about. It can be easier to detect faults and recover from them. Orchestration can also support the development of small, decoupled services that don’t need to be aware of the wider business processes that they are taking part in.

One problem with orchestration is that it tends to give rise to complex infrastructure. A centralised orchestrator service needs to implement a range of concerns, including control flow, routing, connectivity, retries, data transformation, monitoring and reporting. This can give rise to runaway complexity over time as integration logic accumulates in a central integration platform.

In previous generations of service-orientated architecture, an “enterprise service bus” platform would serve as the centralised orchestrator. In this pattern, services would be integrated to a central platform that would take care of any connectivity, translation, and transformation. Apart from serving as an almighty single point of failure, these platforms created development bottlenecks where only a small number of people understood how to operate them.

Vendors in this space generally avoid using the phrase “enterprise service bus” these days, but the pattern is still very much alive. The terminology may have changed, but centralised orchestrators suffer from a familiar set of challenges. They involve a significant learning curve so that integration logic becomes the preserve of a centralised and overworked “integration team”. Over time, integration logic accumulates in an arcane platform that nobody understands, and everybody is too scared to change for fear of breaking something.

Some of these problems are addressed by emergence of lightweight solutions such as Apache Airflow or even “serverless” orchestrators such as AWS Step Functions and Azure Logic Apps. These tools can serve to “democratise” orchestration in that they are accessible to engineering teams wanting to build workflows from autonomous services. Instead of a single, centralised orchestrator running all service collaboration, you can take a more distributed approach where workflow logic can be delegated to engineering teams.

...but then again, neither is choreography

Choreography devolves responsibility to individual service implementations for progressing any workflow. On the face of it, this decentralisation removes orchestration’s single point of failure, providing greater resilience and supporting easier change.

However, this is not a simplification as it requires complex patterns and infrastructure to manage on any kind of scale. You’ll need to consider patterns such as sagas, routing slips, and correlation identifiers to enable transactional behaviour and rollbacks. You’ll also need to invest in logging and monitoring infrastructure to enable you to piece together the status of any processing and detect faults.

Choreography is often preferred in architectures that seek to promote loose coupling above all else and avoid concentrating responsibilities within a centralized orchestration platform. However, this distributed approach to workflow can struggle to handle complex logic or provide adequate visibility of workflow execution.

It's easy for distributed systems to fall into choreography without really meaning to. Many implementations distribute logic without putting in place the necessary infrastructure to manage it. This means that logic is unevenly distributed around services, so it becomes difficult to reason about the state of individual processing workflows. These processes become hard to debug, while features such as workflow-level time outs or audits become very difficult to implement.

When to orchestrate or choreograph?

Orchestration and choreography can complement each other. This doesn’t have to be an exclusive choice between two opposing styles. An architecture based on either orchestration or choreography is bound to be inflexible.

There can also be an organisational angle here, as choreography tends to be better suited for highly decentralised organisations. An orchestration-based approach is more effective if you have direct control over all the participating services. This is difficult to achieve in larger, more distributed enterprises.

As a rule of thumb, for interactions within a bounded context or system boundary it might make more sense to use orchestration. The services within a bounded context are more likely to be cohesive and owned by the same team. This makes it easier to bind them together into a common orchestrator. For collaboration between different systems then choreography might be more appropriate. This supports looser coupling between systems that are likely to be less cohesive and maintained by more separate teams.