8 January 2012
Entity Framework anti-patterns: How not to use an ORM with SQL Server
The Entity Framework has come a long way since its earliest incarnations prompted a vote of no confidence from the development community. Despite growing maturity the framework still suffers from problems that often stem from naive implementations.
Some of these problems are related to limitations inherent in Object\Relational Mapping (ORM) tools. They only provide partially successful abstractions of relational databases and tend to be complex to implement, mainly because they are trying to solve a very difficult problem. You are trying to synchronise two very different representations of data that are used in two very different contexts.
This need not be a barrier to using ORM technologies successfully. The Entity Framework can be a great tool for rapidly building out a data access layer so long as you are aware of its limitations and understand what is going on under the bonnet.
An abstraction too far
One of the strengths of ORM tools is that they provide developers with an abstraction that makes it easier to write data access code. However this can be an abstraction too far - it is a mistake to think that you don’t have to worry about what’s happening at the database level. What be convenient in a development environment can backfire and become a major bottleneck once it goes live.
One of the strengths of ORM tools is that they provide developers with an abstraction that makes it easier to write data access code. However this can be an abstraction too far - it is a mistake to think that you don’t have to worry about what’s happening at the database level. Many ORM implementations run into trouble because of a naïve interactions with the underlying database.
Using an ORM tool does not excuse you from the need for tuning database performance. Developers should take care with LINQ statements that they write as the Entity Framework won’t always map them to a sensible query. There does come a point at which another option such as a custom stored procedure or database view should be considered.
Most DBAs would prefer data access to be controlled through stored procedures rather than relying on framework-generated SQL. Not only will it tend to perform better but it is easier to secure as the consuming account does not have to be given direct access to data tables.
Inappropriate usage scenarios
One of the more common performance complaints about the entity framework is that it is slow for bulk updates. The Entity Framework is primarily designed to make it easier to string up CRUD operations and simple retrieval scenarios. It hasn’t been designed with scenarios such as bulk processing or reporting in mind.
It’s always best to use the right tool for the right job.
Inefficient fetching strategies
When you load an entity in the Entity Framework you have to decide whether or not to load any related entities at the same time. Ideally, you should only fetch the data you really need in order to minimise the amount of unnecessary data querying.
In practice it can be difficult to define those objects that should be loaded lazily and those that should be loaded more eagerly. A mixed solution can become a confusing source of bugs and it can be very difficult to define a consistent strategy that meets every usage scenario.
Ultimately, you need flexibility over your fetching strategy and a one-size-fits-all approach is likely to be inefficient. Some approaches advocate defining an abstracted fetching strategy that can be passed into data methods but this may risk peppering your data access code with implementation detail.
Loose coupling is often more difficult to achieve than it may first appear. It is time consuming to implement, can create complexity and the benefits are not immediately obvious in a development project. That said, without loose coupling an application can become almost impossible to maintain over the long term.
These maintenance difficulties become particularly acute when different tiers of an application are closely bound together. Separate application tiers do not always evolve at the same rate and you are likely to want to be able to change particular implementations without creating a ripple effect throughout the system.
The Entity Framework does help in creating an abstraction between the physical database and data access code, but it’s important to isolate it from any other tiers in the application stack. A common anti-pattern is to expose DataSet objects directly to client applications as this ensures a tight bonding between the Entity Framework and the rest of the solution.
An application that interacts with your data should not need to know anything about the data access technology. The litmus test is whether your client applications need a reference to the Entity Framework to work. If you isolate your data access technology from your client applications then you are free to evolve their implementation free of any unnecessary dependencies.
The Entity Framework relies on a Data Context object to provide data access and these management of these objects is a common source of problems. Given that contexts can provide facilities to track changes to entities the temptation can be to keep them open for longer than you need to.
This is an anti-pattern as you should aim to use a context for a single operation and drop it as soon as you are finished. Without close control of the database context there is a risk that you will develop serious concurrency problems as your contexts start to conflict with each other.
This problem can be countered by using the Unit of Work pattern along with a Repository to regulate access to the database context. The Repository pattern exposes a consistent interface to consuming applications that is abstracted from the entity framework. The Unit of Work pattern in turn can be used to ensure the proper management of a single, consistent data context. Used together, these patterns help to ensure that you keep track of data operations and manage the way they are written to the database in a consistent way.
In a client-server solution, real problems can set in if you try and keep your context alive between different service calls. If you associate a context with a session you can track changes to entities by keeping the context alive between requests.
This approach may seem straightforward, but it creates a number of problems. For starters, managing contexts between service calls is a complex undertaking and it can create serious scalability issues as you spin up a new set of resources to service each client.
Keeping a client associated with a particular context object is an unnecessary overhead when a completely stateless approach will be easier on resources and give rise to a far less complex solution. If there is any information that needs to be persisted between service calls then you should consider managing this on the client rather than trying to maintain an affinity between sessions and contexts.