Event stores and event sourcing: some practical disadvantages and problems

7 January 2017

Event sourcing is based on the idea that we can record changes to the state of a system as events and rebuild state by passing these events through some recursive logic. This process of deriving state from immutable events is a simple yet powerful idea. It provides a historical record that allows you to reconstitute state from any point in the past.

This is an elegant idea that lends itself to any use case where you need to record a reliable and immutable audit trail. Event sourcing can be a particularly good fit for compliance environments where you need to maintain a complete record of everything that has hapenned.

The problem is that event sourcing also brings a great deal of complexity, both conceptual and technical. It isn't worth the overhead unless your system will benefit directly from the specific benefit of that immutable audit trail. It's easy to be seduced by the promises that event sourcing may be more scalable, faster to work with, simpler to model, and provide greater flexibility. These promises have a tendency to break down in the face of real world complexity.

Scaling with snapshots

One problem with event sourcing is handling entities with long and complex lifespans. Entities that are defined by frequent changes in state can become a problem due to the sheer number of events that have to be processed to determine current state.

Event store implementations typically address this by creating snapshots that summarize state up to a particular point in time. This reduces query load as you only need the most recent snapshot along with any events committed since the snapshot's creation.

The question here is when and how should snapshots be created? This is not straightforward as it typically requires an asynchronous process to creates snapshots in advance of any expected query load. In the real world this can be difficult to predict. An effective snapshot strategy may require a complex set of algorithms that are tailored for whatever processes need to access the event store.

Visibility of data

Developers and architects may like the processing power provided by event stores, but support teams tend to be less keen. In a generic event store payloads tend to be stored as agnostic payloads in JSON or some other agnostic format. This can obscure data and make it difficult to diagnose data-related issues.

In data-intensive applications support issues are often caused by data anomalies rather than code-based bugs. A support team typically needs visibility over the data that contributes to any particular problem. This is difficult when the data is only available in an abstract form and requires processing by some recursive logic before it is used by an application. Unless careful consideration is given to the visibility of data then support incidents can be very difficult to unwind.

A fix for a support incident might also require a change to data. This is not straightforward for an event store as committed events are supposed to be immutable. You can issue a new event that corrects the data, but this will only correct future calculations. Any calculations from a previous point will continue to include the bad data.

Handling schema change

If making changes to data is difficult, then changes to the structure of data represent a different can of worms. Events may be immutable, but the structure of future events may change. You are left with a choice between updating all the existing events or handling multiple schemas in your code.

This is a straightforward problem to solve for a relational database, as you can change the entire structure through an update and migration script. To do this in an event store you will need to adjust each event in turn which can take some time over millions of records. It also rather flies in the face of the idea of immutable events.

If you want to preserve the immutability of events, you will be forced to maintain processing logic that can handle every version of the event schema. Over time this can give rise to some extremely complicated programming logic.

Dealing with complex, real world domains

Creeping complexity is a common theme in event sourced systems. Simple domains that are defined by a small number of events can be easy enough to manage, but once you start aggregating multiple event streams then the processing logic needed to hydrate state can quickly become quite onerous.

Although event stores are supposed to provide a system-wide source of truth, it's worth pointing out that applications don't have to interact with the raw stuff of events. Many event sourcing systems expose derived state to consuming applications, maintaining a separation between a working copy of data and the underlying event store. CQRS (command query responsibility separation) is often used in this context, providing a more formal separation between query logic and event persistence.

This separation can help event sourced systems to manage growing complexity. New read-only models can be created from the immutable, single source of truth just by replaying events in a different way. Despite this, a difficulty remains in terms of working out how best to expose these working copies or query stores so the data is timely and relevant.

The problem of explanation fatigue

Perhaps one of the biggest challenges with an event store is the constant need to explain it.

Event stores are an abstract idea that some people really struggle with. They come with a high level of “explanation tax” that has to be paid every time somebody new joins a project. This could be a sign that you are trying to implement an architectural thought experiment rather than a viable processing architecture.

Besides, event stores are only really suitable for domains that are simple, well-understood and tend not to change over time. Doesn't this rule them out of most real world scenarios?

Filed under Architecture, Design patterns, Favourite posts and Event-driven systems.

Event stores and event sourcing: some practical disadvantages and problems

Scaling with snapshots

Visibility of data

Handling schema change

Dealing with complex, real world domains

The problem of explanation fatigue

Messaging anti-patterns in event-driven architecture

Designing an event store for scalable event sourcing

Orchestration vs choreography for microservice workflows

Implementing complex workflows in AI Agents

What should architects focus on - and what should be delegated to teams?

Versioning doesn't make it any easier to manage change in APIs