12 December 2013

Avoiding temporary failure errors with Azure’s distributed memory cache service

The API that you use to access Azure’s distributed memory cache service is simple enough, but if you don’t manage access carefully then you may find yourself falling victim to intermittent “temporary failure error” messages.

The message is verbose enough, but it doesn’t necessarily help you pin down the cause:

There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.)

These messages are both intermittent and irritating. A full and enduring solution requires that you address two different concerns:

  • Managing the life cycle of the DataCache and DataCacheFactory properly
  • Handling the kind of transient errors that always happen in a cloud-based service

Managing object life cycles

Many Azure APIs provide a bit of a black box and the caching API is no exception. The DataCacheFactory class is used to create a connection to the cache and it returns a DataCache object that exposes all the cache operations.

However, it seems that you need to keep a reference to a working DataCacheFactory object for the cache methods to work. If you dispose of your factory because it has finished its creation duties then you will get temporary failure errors. The code example below may look sensible enough, but the Get method will fail:

DataCache cache;
using (DataCacheFactory factory = new DataCacheFactory()) 
{
    cache = factory.GetDefaultCache();
}
object fail = cache.Get("KEY");

Therefore, it appears that the factory has an on-going role in connection management and needs to be maintained. Ideally, access to the cache should take place through a singleton so you can ensure that a single connection to the cache is maintained for the lifetime of an application. Using Microsoft’s guidance for building a singleton, a basic cache service class would look like this:

public sealed class CacheService
{
    private static volatile CacheService _instance;
    private static object syncRoot = new Object();
    private DataCacheFactory _cacheFactory;
    private DataCache _cache;

    private CacheService()
    {
        this._cacheFactory = new DataCacheFactory();
        this._cache = this._cacheFactory.GetDefaultCache();
    }

    public static CacheService Instance
    {
        get
        {
            if (_instance == null)
            {
                lock (syncRoot)
                {
                    if (_instance == null) _instance = new CacheService();
                }
            }

            return _instance;
        }
    }

    private void CreateCache()
    {
        this._cacheFactory = new DataCacheFactory();
        this._cache = this._cacheFactory.GetDefaultCache();
    }
}

Handling transient errors

Connections to remote services inevitably suffer from the occasional drop out or communication failure. Exceptions such as TimeoutException and ServerBusyException are often temporary conditions that your application should be able to recover from.

When an application meets this kind of transient error then it should be able to retry an operation after a short pause to see if the conditions that caused the error have changed. You should retry an operation a number of times before giving up and winding an exception up the stack.

There is nothing in the Azure API to help with this kind of error handling, so you have to plumb it in yourself. Fortunately, the Microsoft Enterprise Library provides an implementation of the Transient Fault Handling Block for the caching service that can manage this for you with one or two lines of code:

RetryPolicy<CacheTransientErrorDetectionStrategy> retryPolicy = new RetryPolicy<CacheTransientErrorDetectionStrategy>(RetryStrategy.DefaultClientRetryCount);
return this.retryPolicy.ExecuteAction<object>(() => this._cache.Get(key));

Note that the caching API does not currently provide any asynchronous methods so you will have to use the synchronous versions for any calls.

If you combine the handling for transient errors with the singleton implementation then a fully robust service class that avoids temporary errors looks like this:

public sealed class CacheService
{
    private static volatile CacheService _instance;
    private static object syncRoot = new Object();
    private DataCacheFactory _cacheFactory;
    private DataCache _cache;
    private RetryPolicy<CacheTransientErrorDetectionStrategy> _retryPolicy = new RetryPolicy<CacheTransientErrorDetectionStrategy>(RetryStrategy.DefaultClientRetryCount);

    private CacheService()
    {
        this._cacheFactory = new DataCacheFactory();
        this._cache = this._cacheFactory.GetDefaultCache();
    }

    public static CacheService Instance
    {
        get
        {
            if (_instance == null)
            {
                lock (syncRoot)
                {
                    if (_instance == null) _instance = new CacheService();
                }
            }

            return _instance;
        }
    }

    private void CreateCache()
    {
        this._cacheFactory = new DataCacheFactory();
        this._cache = this._cacheFactory.GetDefaultCache();
    }

    public object Get(string key)
    {
        return this._retryPolicy.ExecuteAction<object>(() => this._cache.Get(key));
    }

    public void Put(string key, object cacheObject)
    {
        this._retryPolicy.ExecuteAction(() => this._cache.Put(key, cacheObject));
    }

    public void Remove(string key)
    {
        this._retryPolicy.ExecuteAction(() => this._cache.Remove(key));
    }
}

Filed under Azure, C#.