25 August 2009

Developing EPiServer page providers: optimizing performance

EPiServer’s page providers are an enterprise-only feature that allow you to integrate external data sources into EPiServer’s page structure. It offers enormous potential in terms of enabling EPiServer to publish information in a common format from a variety of sources across the enterprise.

The how-to of page providers is covered in a variety of places – the best place to start is EPiServer’s white paper on the subject which contains an XML page provider sample. It all looks simple enough from the outset, but it’s only once you start developing implementations for “real-life” data sources that you realise how much coding is involved in a page provider implementation. Given that the feature is relatively new, there doesn’t appear to be too much explanation over what really goes on under the hood and how to develop with efficiency and scalability in mind. 

Having developed providers for relatively large (10,000+ records) data sources, the following are some suggestions for things to bear in mind when developing your page provider. 

Understanding what EPiServer caches – and what it doesn’t

EPiServer relies very heavily on the cache for performance, but it’s important to understand when EPiServer uses the cache and when it will attempt to query your provider’s data source. Once you have constructed a page in your provider’s GetLocalPage() method it  will be placed in the EPiServer page cache and served from memory from then on. However, there are still a number of instances where you will need to refer to the underlying data source – this happens more often than you might think:

  • EPiServer caches page content, but it does not cache the content structure, i.e. the arrangement of parent and child pages. This means that the structure is constantly re-queried as you access pages, even if the page content is sitting in the page cache.
  • If you are looking at a page in edit mode then EPiServer does not retrieve the cached version of a page – it constructs the page from scratch every time.
  • If you want to provide property-based search on your page provider then you will have to implement FindPagesFromCriteria() yourself and plumb the criteria directly into your data source.
  • Any page updates will, of course, have to be routed back to your data source.

Despite the aggressive page caching, EPiServer does reference the data source for pages pretty regularly – i.e. every time you show a menu or do a search you will be hitting the data. This is worth bearing in mind if you access your data source via a relatively expensive operation such as a web service method.

The importance of an optimised data store

Given the fact that a page provider requires constant access to the data source, it’s worth caching the data in a structure that is optimised for the kinds of search operations that EPiServer will perform on it. The structure of your cached data is vital for effective provider performance – it is worth doing some performance tests to ensure that you’ve got the optimum structure for the following types of page search (in descending order of frequency):

  1. Searching for a page based on a page reference – this is by far the most common
  2. Searching for a page based on its parent’s page reference
  3. Finding a page based on its GUID
  4. Property-based searches using FindPagesWithCriteria() – if you’re implementing search.

For a very large data set, my solution was to cache the data source by encapsulating the underlying data for a page in a class and using a generic dictionary to store the data, with a PageReference being used for the key, i.e.Dictionary<PageReference, MyPageClass>. This structure provides for super-fast searches pages based on a PageReference, while other, less frequent searches can be carried out in reasonable time using Linq.

Caching your data source in an optimised data structure does break the direct link between EPiServer and the data source, but in cases with large amounts of page data served by slow interfaces it is pretty necessary. You are also creating processing overhead when you create your cache – this can be mitigated by caching the structure and refreshing it periodically through a scheduled task.

GUIDs and unique IDs

In order to support EPiServer’s internal linking you will have to maintain a permanent mapping between each page served by your provider, a unique integer-based ID and a unique GUID. If your data source provides the ID and GUID values then you’re in luck, otherwise you will have to develop a mechanism that creates these unique ID and GUID values and persists them against a page.

There are no short-cuts to creating a GUID for every page. The EPiServer community has suggested a Guid-less provider as a solution to creating GUIDs for every page, but given that this involves hacking the bits in a GUID it is not recommended for live systems as it undermines the whole point of GUIDs – i.e. that they should be unique.

This requirement for persisting ID and GUID values makes the idea of a pattern based on an optimised data structure even more attractive on performance grounds. Ultimately, you are likely to be storing these values in a database table, so caching them in memory through your optimised page structure will save you an untold number of unnecessary data reads whenever EPiServer tries to look up a GUID on the basis of a page ID and vice versa.

Cache the page references, not the IDs

The golden rule with page providers is to always work with PageReference objects, never with the raw integer-based IDs. EPiServer constructs a page reference from a combination of the provider name and unique identifier for the provider – the ID value itself is not unique.

If you want to compare page identifiers then you must use the base class’s ConstructPageReference() helper method to form a PageReference before performing the comparison. The over-head of always having to construct a PageReference may appear small, but over thousands of repetitive operations it does start to add up. It makes a lot of sense to cache the PageReference for each individual page to save yourself countless method calls.

Consider providing read-only access

Adding support for creating, updating and deleting content doesn’t necessarily have to impact on the performance of your provider. After all, page updates are comparitively rare operations so you can afford to suffer a performance hit on updates in order to maximise the performance of read operations.

That said, there’s no point writing more code than you really have to. In order to save yourself the development overhead of writing in the plumbing for updating your data source it’s worth taking a long, hard look at whether or not you really need to provide data updates through EPiServer. The key question here is what role is EPiServer playing in your information architecture, i.e. is it there primarily to publish web content? Most optimisation scenarios don’t necessarily need a two-way page provider integration as other systems take responsibility for managing the life cycle of of the underlying content.

Security descriptors

EPiServer’s XML Page Provider sample contains an interesting custom security descriptor pattern. This allows the security settings for each page to be set by the data source and passed in to each individual PageData object when they are created.

Managing the security settings for each individual page in your provider can be quite an overhead. If an individual page does not have an ISecurityDescriptor object defined for it then it should check the parent page for the security information.

This overhead may be unavoidable, but if you are applying the same security settings across every page in your provider then it is worth caching a custom security descriptor in your provider and applying it to every page. This does provide a considerable performance boost as you are not having to do the work of checking parent pages to find and apply security settings.

Making life easier for editors

When you are serving up thousands of pages in flat, single-level structure, this can provide real performance issues for content editors. When you click on a node in EPiServer, it will query the structure below that node andfor the level below that – which is quite a lot of work when you have 10,000 pages. If a user accidentally clicks on a node containing thousands of provider pages then they will have a long wait on their hands while EPiServer populates the page tree.

This does call for some automatic sorting for your provider content if you are serving up more than 500 pages through a provider. If you are constructing a cached content structure then you can also write in a system of category pages for the provider – sorting pages into at least two levels in this way will help to protect your content editors from some very frustrating load times.

The caveat – search

Finally, beware – EPiServer’s text-based search does not work with page providers. This makes sense when you consider how EPiServer’s search works, but it does have a big impact on your site. If you are going to make heavy use of page providers then you will also have to consider implementing a third party search solution if you want your provider content to be included in a site-wide text search. Spidering-based solutions such as Google Mini will do the job here, but you are adding cost and complexity to your project.

Filed under ASP.NET, CMS.