10 November 2010
ORM wars: Comparing nHibernate, LINQ To SQL & the Entity Framework
One of the more enduring problems of data-related development is bridging the gap between relational data storage and object-based programming models. These are two very different approaches to data which do not blend seamlessly.
Object-relational mapping (ORM) attempts to solve this mis-match by providing a translation layer between relations and objects, but this is no silver bullet. Ted Neward described ORM as the "vietnam of computer science" in that it's a quagmire that starts well, gets more complicated as time passes and ends up as an open-ended commitment with poorly-defined goals and no clear exit strategy.
There is no perfect solution to the object\relational mapping problem. There are some solutions - and they can be very useful - but any solution inevitably involves some trade-offs that do not necessarily become clear until much later in the development cycle.
Developers and architects can get pretty evangelical and emotive about their favorite approach to data access. Some choose to abandon ORM all together and retreat to a programming model that doesn't rely on objects. After all, a rich domain model for your data may look neat, but it doesn't always deliver pay-back in terms of return on investment. Others go the whole hog and abandon relational data storage entirely, relying on object-based data storage.
The rest of us are stuck somewhere in the middle, using code to marshal a relational schema into an object model. You can hand-tool this mapping by writing a lot of code, but a number of ORM frameworks are available to do the plumbing for you. This is where things can get pretty subjective - there is little objective comparison available for the different tools and technologies.
This article examines some of the main trade-offs involved in selecting an ORM approach for .NET. There are a multitude of different frameworks out there, but I will be concentrating on the following four as they are well-established and widely-used:
- Good-old, hand-tooled data access using ADO.NET
- Generating an ORM layer using nHibernate
- Using LINQ to SQL to pull out data into an object model
- Using Microsoft's Entity Framework 4.0
Control and maintenance
Hand-tooled data access using ADO.NET gives you the greatest degree of control over your data access, particularly when it comes to the SQL that is being executed. This approach lets you write finely-honed, high-performing code that is specifically tailored to your system. However, the downside is the sheer quantity of code that you have to write, most of it being tedious plumbing. All this extra hand-tooled code can be a hiding place for a lot of small bugs too and getting your hand-tooled data access stable is more time-consuming than you might think.
ADO.NET can be a great candidate when you really need fast data access to a relatively small data model. The catch is that over time a small and simple data model tends to become a large and complex data model and you are stuck with a data access strategy that requires a lot of effort to maintain and enhance.
The other approaches all take care of much of the dirty work of data access for you, though they do this in very different ways. nHibernate uses XML to map data table to data objects and data access is managed for you by a central Session object. This does hide the details of SQL from you and means you have much less code to look after, but nHibernate does give you the feeling that you are swapping one maintenance headache for another. The use of XML definition files does tend to create "configuration hell" for larger systems - though this can be mitigated by management tools - and nHibernate's reliance on a loosely-typed querying syntax can be a common source of bugs.
LINQ to SQL isn't necessarily a "full-fledged" ORM tool and is more of a fast way of accessing a database. It automatically generates a set of objects that directly map onto your data tables and many developers use LINQ to SQL in conjunction with LINQ to Objects to map these database objects onto entities. This approach does facilitate proper mapping of data tables to objects, but it tends to create a large slug of mapping code which can be difficult to maintain. Developers like LINQ to SQL though as it generates data objects very quickly and is easy to get started with - it also provides some pretty good visual tools in Visual Studio.
The Entity Framework is Microsoft's fully-fledged ORM tool that is based on an Entity Data Model that is represented as XML and normally embedded in an assembly. The XML-based configuration isn't something you necessarily have to deal with directly as the Entity Framework provides some rich visual tools for managing the mapping. One of the strongest features of the Entity Framework is its strong typing and support for compile-time model checking. This makes for the lowest maintenance over-head of any of the platforms being considered here.
As ADO.NET provides you with the greatest level of control over your SQL and code execution it is a good candidate for smaller domain models where data access has to be fast. The problem is that small data models have a tendency to grow over time so that you can easily find yourself maintaining a large rig with a data access strategy that isn't scaling well. You may need fast data access, but you should question exactly how fast this really needs to be and whether the potential performance gain is really worth the extra long-term maintenance effort.
You can do a lot to leverage the performance of the other approaches, but it does require in-depth knowledge of the technology and a bit of forward planning. They all support using SQL stored procedures for data operations rather than formatting raw SQL on the fly. They also provide different features for tuning performance in specific circulstances. For example, Linq To Sql's compiled queries can boost regularly used operations and nHibernate's support for batching read and write statements provides a further level of control over database operations.
In general, you should take care not to use a blanket approach to every data operation and really know what is going on under the hood so that you do not accidentally introduce performance bottle-necks. Bear in mind that no matter what you data access approach is, the "usual rules" will always apply, i.e. you shouldn't work with any more data than you really need and you should always look to minimising the number of round trips to the database server.
Coherence with the .NET framework
One of the perceived advantages of ORM is that it allows developers to work in a common object-orientated paradigm that they generally feel more comfortable with, rather than having to work directly with SQL and relational data. In order to fully realise the productivity gains to be made here a framework should fit closely with the most commonly-used patterns in the .NET framework so it feels like a natural extension.
This is one of the Entity Framework's trump cards as it is very closely integrated with other parts of the .NET framework, including areas such as ADO.NET Data Services. Linq To SQL does not benefit from this degree of integration and given that nHibernate has its roots in the Java world, it does not fit snugly with the .NET framework. Until pretty recently nHibernate did not provide decent support for commonly-used paradigms such as LINQ and IQueryable which made it more of a struggle for .NET developers to start working with.
Earlier incarnations of the Entity Framework came into a lot of criticism because it tended to force you into a particular design pattern and did not provide support for POCO objects. The data layer generated by the Entity Framework tended to force framework dependencies into every part of the data tier, making unit tests almost impossible to implement properly. Linq To SQL has a similar problem in that it is too closely-coupled to the underlying framework to be allow for genuine atomic tests to be written against it.
nHibernate and ADO.NET have always provided good support for testability as they allow you total control over your code design. The most recent version of the Entity Framework has addressed this issue by exposing the code generation mechanism which is driven by T4 templates. Templates are now available for a variety of class design strategies, including fully testable POCO classes.
Design approach and modelling capability
Which should come first, the relational design or the object design? In many cases you won't get a choice - there will be a database design in place that you have to interface with. In this case the approach is data-driven and you need a framework that will map seamlessly onto any design that you are presented with. Many architects prefer to design a data tier in terms of objects rather than relational data tables, so a technology that supports an object-led approach will afford you greater flexibility.
ADO.NET does not really provide you with any modelling support in either direction, so you are free to design (as badly) as you wish. Likewise, Linq To SQL is a technology that only maps onto a database schema, so you can take any design approach you want but the technology will not actually assist you in any meaningful way. nHibernate can also be used with both approaches and tends to be favoured by those who prefer an object-led approach to data design. That said, it offers little direct support for either approach, but at least it does not hinder them.
The most recent incarnation of the Entity Framework provides more direct tool-based support for both approaches. The database mapping is smooth and automatic, while the "code first" feature allows you to start with your object model and reverse engineer it into a database structure. In terms of design flexibility and tool-set support for schema design it certainly has the edge over nHibernate.
Maturity and functionality
nHibernate has been around for a long time and has evolved to provide a very rich feature set. It provides support for a number of important features for scaling data access that you will not find in other frameworks, such as batching your database reads and writes to limit the number of round trips to the server. It also has a number of extension projects associated with it, such as nHibernate Search, nHibernate Validator and nHibernate Shards - these kinds of extensions projects are unlikely to emerge for the Entity Framework.
The latest incarnation of the Entity Framework has addressed a number of shortcomings that made it a poor relation to nHibernate in a functional sense. It now supports lazy loading - i.e. allowing you to control the amount of related data that is loaded with each object . However, it still feels like a relatively new technology.
Support and longevity
ADO.NET is a good long term bet, being a major part of Microsoft's framework, but it is light on functionality and Microsoft are leaving it where it is while they extend other parts of the framework. Linq To SQL, on the other hand, seems to have been superseded by the Entity Framework and Microsoft do not have any plans to develop the functionality in the future. Much of the Linq To SQL community has been left feeling "high and dry" by this development having invested a great deal of effort into the technology only to see it drop down Microsofts list of priorities.
This can be a risk with Microsoft technologies and care should be taken before selecting a technology as you are making a decision that you will have to live with for a number of years. The Entity Framework looks like a fairly safe bet at the moment, particularly given the fact that Microsoft have addressed many of its earlier shortcomings in the most recent release. nHibernate has a large development community behind it and is continuing to mature as a technology, but much of this support may have arisen out of the lack of any serious ORM technology being provided by Microsoft. It remains to be seen how nHibernate will fare in the future once more people adopt the Entity Framework.