Autonomous bubbles and event streams: Pragmatic approaches to working with legacy systems

There’s no universal definition of what constitutes “legacy” software. Some developers seem to regard it as anything that they did not personally write in the last six months. More seasoned executives will flatly deny use of the term even when gazing out across their estate of mainframe servers.

If legacy means anything at all it’s a system that is sufficiently outdated to undermine development velocity. The system is inherently unstable and difficult to change safely. Some features cannot be delivered at all, while others become unnecessarily expensive due to the amount of effort required to work around the inadequacies of the application and its environment.

Legacy doesn’t necessarily mean “old”. The accelerating march of frameworks and languages means that it’s easy for code bases to be left behind. Web applications based on earlier versions of JavaScript frameworks such as Angular and Ember can suffer from legacy difficulties when they are only a couple of years old.

Legacy doesn’t have to involve obsolete technology either. Obsolete architecture can have a similar impact on development velocity no matter what the underlying technology. Long-lived code bases tend to suffer from entropy over time, even if the underlying frameworks are kept up to date. Lifting and shifting a tangled code base to a more recent technology won’t suddenly make it more malleable.

Why does this matter?

Legacy systems don’t collapse overnight. There’s no sudden, insurmountable crises but more of a long, slow decline of development velocity.

A large, long-lived system inevitably loses its shape over time and becomes more difficult to work with. This accumulation of quick fixes, technical debt and evolving complexity is not restricted to older code bases. It’s surprising how quickly a badly-managed code base can sink into disrepute.

Bigger problems caused by technical obsolescence start to creep in if a system is based platforms that are not under active development. Vendors may claim on-going support for a platform, but that’s not the same as actively maintaining it. The underlying run-times are not kept up to date in response to evolving security threats or emerging protocols.

This also means that there is no wider ecosystem of components and tooling. The techniques and frameworks that developers take for granted on modern ecosystems are not available. Agile technical practices such as test-driven development or continuous integration are effectively closed off, trapping a platform in more uncertain and manually-intensive delivery.

Added to this are growing problems of recruitment and retention. You might get lucky and find great developers who are prepared to work on legacy systems, but they are few and far between. It becomes very difficult to maintain a stable team of developers who understand the system in any depth.

Dealing with legacy systems. Or not.

Michael Feathers has written extensively around how you can surround legacy code with tests to enable safe changes. Advocates of microservices suggest that you can gradually decompose an application into smaller service implementations. A similar idea involves developing “strangler applications” that slowly grow around the edges of legacy systems, eventually over-powering them.

The problem with these re-write approaches is that they are very difficult to complete for larger, longer-lived platforms. They often suffer from a fatal loss of momentum, attention and support. This can leave successive failed re-writes lingering in legacy code bases, much like geological layers that provide evidence of multiple cataclysms in the distant past.

A bigger obstacle can normally be found in the internal organisation of the code. At the heart of many legacy systems is a Gordian knot of data and behaviour that is all but impossible to separate out. It’s just not realistic to imagine that you can gently decompose a fifteen-year old system based on hundreds of tables and thousands of stored procedures.

A hard truth is that in many cases a legacy platform remains the commercially optimal means of delivering functionality. The benefits of migrating to a more modern architecture cannot overcome the astronomical costs of a re-write.

In this case, running a system indefinitely under a form of palliative care starts to appear viable. A small team of seasoned developers are made comfortable with large salaries and generous pension plans. Bugs get fixed, but any new feature development happens elsewhere. This is what a lot of organisations are doing with their legacy systems, even if they won’t admit it.

This is often difficult for development teams to swallow. They need a means to augment legacy platforms with new features. They want to reduce the support burden by modernising common trouble spots. They also want the opportunity to work with more modern technology. How can you address these difficulties without falling into the re-write trap?

Using “bubbles” to separate the new from the old

Like any Gordian knot, a more creative approach is required to solve the problem. A pragmatic solution involves putting in place a structure that enables development alongside the legacy platform. You won’t transform the core application, but you can at least allow some new feature development, introduce some new technologies and even support some gentle decomposition. All without falling for the folly of a rewrite.

Eric Evans described establishing a “bubble” for new development that can sit alongside the legacy platform without being directly dependent on it. This is a small, self-contained part of the domain that can be “fenced off” from the main system using an anti-corruption layer. This bubble provides sufficient isolation to allow a small team to develop a solution without being constrained by the legacy platform.

The anti-corruption layer can be a simple API interface that allows data exchange with the legacy platform without directly coupling to it. The idea is to protect the bubble from being controlled by the legacy platform, so it can implement its own model and architecture.

Bubbles can be a useful technique for getting new development up and running, but they tend to be fragile. It’s hard to maintain the discipline of the anti-corruption layer over time and easy to compromise the bubble’s autonomy.

The “autonomous bubble” makes this bubble approach more permanent by separating it completely from the legacy platforms. It should be able to run completely under its own steam without needing to refer to any legacy systems. This implies that it will run its own data persistence and take responsibility for synchronising with legacy platforms.

This synchronisation could use something as humble as a batch file export, but a more robust approach would involve broadcasting events on a messaging technology. This event-based approach to integration allows for more timely updates but also lends far greater autonomy to the new context.

Exposing legacy data as event streams

A weakness of the bubble pattern is that it can be difficult to defend the boundary with the legacy system over time. Whatever your choice of anti-corruption layer or synchronisation mechanism there is an interface that needs careful curation. Any changes need to be synchronised between the bubble and the legacy system, requiring modelling work and tortuous parallel planning between development teams.

One technique for avoiding the overhead of an anti-corruption layer is to broadcast all the changes in a legacy system using an event streaming technology like Kafka. Tools such as Striim and Attunity can capture and transform database updates by processing database transaction logs. This data will be in a very raw state, but you can build downstream processes that consume these events and translate them into a meaningful format for other services to consume.

This “all in” streaming model allows you to de-couple a new service architecture completely from legacy platforms without having to model the exchange or synchronise development. These legacy systems are still responsible for data collection and legacy processing, but their data become freely available to new services.

This approach is particularly useful if you need to implement data-centric features such as cross-platform reporting, forecasts and analytics. It implements data as a “tap” where you are free to draw off data as and when you need to without direct dependencies on legacy platforms.

Does this really solve the problem?

Bubble patterns and event streams require some work to establish but they can facilitate a path towards a more modernised architecture. That said, this is merely side-stepping the problem rather than confronting it directly.

Legacy systems are often an unsolvable problem. Hence the fact that they tend to hang around for so long - it’s rarely for lack of will or inspiration. Every legacy platform usually has some evidence of a botched re-write lurking somewhere in its bowels.

You can at least make progress towards new development without the false promise of decomposition and replacement. Most importantly, any new services can be fully de-coupled from the inner workings of the legacy system. This provides a more pragmatic and sustainable approach to dealing with legacy platforms than the delusion of a re-write.