22 February 2013

A shared database is still an anti-pattern, no matter what the justification

Linking systems and processes together through a shared database might seem inviting at first. It makes it easy to synchronise data between applications and it utilises technology that everybody can understand.

However, if you take a long-term view there’s no real justification for using a shared database to facilitate application integration. Most shared databases turn into performance bottlenecks that encourage close-coupling as well as creating a single point of failure.

“It’s quick to develop”

It may be quick to develop a shared data store, but you want to ensure  that you can grow and maintain your applications over the long term.

A fundamental problem with the shared database is that it makes it difficult to define and enforce clear boundaries between systems.  It’s also difficult to define any kind of meaningful data schema that can be used by multiple applications. The end result is usually a design that is extremely difficult to work with and a melting pot of logic and data that doesn’t have any clear responsibilities.

As a shared database becomes entwined in multiple systems it becomes more difficult to make changes without causing regression. There are also more likely to be political difficulties as the number of stakeholders involved in the system multiply. You will reach a point where you are unable to refactor the database for critical systems because of the potential impact on secondary systems.

Even well-designed applications with carefully-constructed data abstractions will fall victim to this paralysis in time. They are particularly vulnerable to secondary applications with less disciplined data strategies where data is accessed directly. This crude access will undermine any carefully-planned abstractions and application-level concerns such as security or caching will be impossible to enforce. The potential for locks and deadlocks will increase as the database becomes a vast systems bottleneck.

“The functionality is all encapsulated in one place”

This feels a little like turning architectural best practice on its head.  The objective is not to establish a single source of all truth. Knowing more-or-less where everything resides is of little comfort when it’s an over-complicated mess that everybody is scared of touching for fear of breaking it.

Placing all your eggs in one basket doesn’t exactly help with resilience and scalability. A shared database encourages the growth of a “super system” in your architecture that attracts stray pieces of logic and data. If it goes down, your entire platform is pretty much offline until it recovers. It can quickly become a processing bottleneck as it is unable to scale in step with the myriad of different business processes that it supports.

“We all understand the technology”

The.Net framework’s data stack of Entity Framework and SQL Server makes it pretty easy to persist a data model. The problem is that developers often reach for it by default so that it becomes a “hammer for every nail”.

Take the example of using a shared database to implement a processing queue.  On a basic level it’s pretty easy to create a shared data store that keeps track of tasks and have processes poll the store for their next job. So long as you don’t have to deal with much load you’ll probably be fine. However, once you up the processing load you will start to run into trouble over concurrent updates and locking, polling that hammers your database into submission and a growing dump of completed task information that has to be cleaned up. Your simple solution that everybody can understand quickly becomes a flaky performance bottleneck.

The over-use of a generic technology can be a sign of a development culture that has stopped innovating. Developers ought to be keen to explore new technologies rather than attempting to fix every problem through the medium of C# and SQL code. That said, sometimes it can be difficult to know where to start without appropriate technical leadership.

“We didn’t know what else to use”

There’s a great set of essays around enterprise integration styles by Martin Fowler that almost traces a line of maturity from file transfer to messaging via shared databases and remote procedure invocation.

A shared database is a more convenient means of exchanging data than file transfer as it allows for immediate updates and can be used to assert a common data schema. Remote procedure invocation makes it easier to isolate and share functionality though does not eliminate coupling between systems. Messaging allows for more complete de-coupling and effective system collaboration at the cost of a greater learning curve.

Messaging-based patterns normally offer a more appropriate means of integrating systems and processes. Note that this does not guarantee an elegant solution and developers can often be left writing messy glue to make everything fit together. It also tends to involve greater complexity and development teams may have difficulty with concepts such as asynchronous processing. This learning curve is worth negotiating as the looser coupling afforded by messaging will always be preferable to the confused melting pot that is created by shared database integrations.

Filed under Architecture, Design patterns, Favourite posts, Integration, Rants, SOA.