7 January 2017

Event stores and event sourcing: some practical disadvantages and problems

Event sourcing is based on the idea that we can record changes to the state of a system as events and rebuild state by passing these events through some recursive logic. This process of deriving state from immutable events is a simple yet powerful idea. It provides a historical record that allows you to reconstitute state from any point in the past.

This may sound elegant, but any implementation of event sourcing involves some practical challenges, starting with how you persist these events.

Typically, an event store models commits rather than the underlying event data. An event store doesn’t have to be concerned with the structure of event data, this is left to the recursive logic that determines state.  An agnostic approach to data gives you the potential to scale very nicely by doing away with structures such as foreign keys and avoiding the kind of contention and locking that can bring down relational databases under load.

This simplicity and scalability can start to break down once you subject an event store to real world complexity. If an event store is applied to anything but a very small and static domain model it can start to run into a  number of technical and operational difficulties

Scaling with snapshots

One problem with event sourcing is handling entities with long and complex lifespans. Entities that are defined by frequent changes in state can become a problem due to the sheer number of events that have to be processed to determine current state.

Event store implementations typically address this by creating snapshots that summarize state up to a particular point in time. This reduces query load as you only need the most recent snapshot along with any events committed since the snapshot’s creation.

The question here is when and how should snapshots be created? This is not straightforward as it typically requires an asynchronous process to creates snapshots in advance of any expected query load. In the real world this can be difficult to predict. An effective snapshot strategy may require a complex set of algorithms that are tailored for whatever processes need to access the event store.

Visibility of data

Developers and architects may like the processing power provided by event stores, but support teams tend to be less keen. In a generic event store payloads tend to be stored as agnostic payloads in JSON or some other agnostic format. This can obscure data and make it difficult to diagnose data-related issues.

In data-intensive applications support issues are often caused by data anomalies rather than code-based bugs. A support team typically needs visibility over the data that contributes to any particular problem. This is difficult when the data is only available in an abstract form and requires processing by some recursive logic before it is used by an application. Unless careful consideration is given to the visibility of data then support incidents can be very difficult to unwind.

A fix for a support incident might also require a change to data. This is not straightforward for an event store as committed events are supposed to be immutable. You can issue a new event that corrects the data, but this will only correct future calculations. Any calculations from a previous point will continue to include the bad data.

Handling schema change

If making changes to data is difficult, then changes to the structure of data represent a different can of worms. Events may be immutable, but the structure of future events may change. You are left with a choice between updating all the existing events or handling multiple schemas in your code.

This is a straightforward problem to solve for a relational database, as you can change the entire structure through an update and migration script. To do this in an event store you will need to adjust each event in turn which can take some time over millions of records. It also rather flies in the face of the idea of immutable events.

If you want to preserve the immutability of events, you will be forced to maintain processing logic that can handle every version of the event schema. Over time this can give rise to some extremely complicated programming logic.

Dealing with complex, real world domains

Event stores are not a generalised pattern that can be applied across a large domain. They are suited to relatively simple and self-contained models. The model works well for simple domains defined by one or two events, but once you start aggregating multiple types of event stream any processing logic can quickly become cumbersome.

Given how quickly processing complexity can escalate once you are handling millions of streams, it’s easy to wonder whether any domain is really suitable for an event store. The inflexible nature of immutable events also makes them a poor candidate for modelling any domain that is subject to change. Arguably, this means every domain!

It’s worth pointing out that applications don’t always have to interact with the raw stuff of events. Many event sourcing systems expose derived state to consuming applications, maintaining a separation between a working copy of data and the underlying event store. This can overcomes some issues of complexity, but a difficulty remains in terms of working out how best to expose this working copy so the data is timely and relevant.

The problem of explanation fatigue

Perhaps one of the biggest challenges with an event store is the constant need to explain it.

Event stores are an abstract idea that some people really struggle with. They come with a high level of “explanation tax” that has to be paid every time somebody new joins a project. This could be a sign that you are trying to implement an architectural thought experiment rather than a viable processing architecture.

Besides, event stores are only really suitable for domains that are simple, well-understood and tend not to change over time. Doesn’t this rule them out of most real world scenarios?

Filed under Architecture, Design patterns.