22 April 2013

Writing a brokered messaging client for Azure Service Bus that is production-ready

Azure’s Brokered Message API provides a basic set of methods that make it easy to start sending and receiving messages through the Azure Service Bus. The problem is that it doesn’t do much to provide some of the basic scaffolding required by a serviceable messaging client. You are expected to provide most of this yourself.

Using asynchronous methods

Sending and receiving messages to and from the Azure Service Bus will inevitably incur a fair amount of latency given that it is a remote service. Requesting messages can be particularly drawn out if you send a polling request that maintains a connection to the server until a new message arrives.

If send and receive operations are performed synchronously they will block the thread from the CLR’s thread pool while waiting for an operation to complete. The thread pool has a limited capacity so these calls should be made asynchronously to avoid unnecessary thread blocking.

Although the Brokered Message API provides synchronous methods for most operations, it is normally recommended that you leverage the asynchronous implementations. This is straightforward enough to implement now that asynchronous methods have been added to the brokered messaging API in version 2.0 of the Azure SDK. The code below demonstrates invoking a basic message send using the asynchronous methods. Note that exceptions are only thrown when the task completes so you should try and handle them in a ContinueWith block.

public void SendAsync(BrokeredMessage message, QueueClient client) 
{ 
    client.SendAsync(message).ContinueWith(t =>
    {
        if (t.Exception != null)
        {
            // Throw any exceptions that cannot be handled here
            throw t.Exception.Flatten();
        }
    });
}

Handling transient errors

Connections to a remote service such as Azure Service Bus will inevitably suffer from the occasional drop out or communication failure. Exceptions such as TimeoutException and ServerBusyException are often temporary conditions that your application should be able to recover from.

When an application meets this kind of transient error it should retry an operation after a short pause to see if the conditions that caused the error have changed. You will want a number of retries before finally giving up and regarding the operation as a failure. This also has to happen in the context of asynchronous operations so not to hold up the calling thread.

Although the Azure client objects provide a fair amount of protection in terms of recovering from error conditions they do not help with these kind of transient errors. This leaves you with a fair amount of plumbing to provide in a client application. Fortunately, the Microsoft Enterprise Library provides a Transient Fault Handling Block for Azure Service Bus that can manage this for you. It allows you to wrap calls to the Azure API within a RetryPolicy which will manage the handling of transient exceptions within the context of a retry loop.

The behaviour of a RetryPolicy can be configured to determine the number of retries and pauses between them. The code example below shows a simple retry policy applied to our original example of an asynchronous send operation.

public void SendAsyncTransient(BrokeredMessage message, QueueClient client)
{
    // Create a retry policy
    var retryPolicy = new RetryPolicy<ServiceBusTransientErrorDetectionStrategy>(RetryStrategy.DefaultFixed);

    // Send the message asynchronously
    retryPolicy.ExecuteAsync(() => 
        client.SendAsync(message).ContinueWith(t =>
        {
            if (t.Exception != null)
            {
                // A non-transient exception occurred or retry limit has been reached 
            }
        }));
}

Understand the message lifecycle

There are two different ways in which you can manage the lifecycle of messages. If your QueueClient has its ReceiveMode set to ReceiveAndDelete then a message is removed from the queue as soon as it has been received. This is not terribly resilient as if your client runs into trouble while processing the message there is a chance that the information might be lost.

A more robust approach is to use the PeekLock mode where a message is locked for a specific duration until the client has explicitly determined what to do with it. A client is expected to use an explicit closing method on the message such as Complete() to destroy it or Abandon() to return it to the queue.

This does provide for a more robust and transactional style, but it is a little more difficult to work with. If the client runs out of time before closing the message or loses its connection to Azure then the message is implicitly abandoned and returned to the queue. The message closure methods are all remote calls that are subject to transient errors so will need asynchronous calls wrapped into a retry policy.

Consider external storage for large messages

You have a size limit of 256k for service bus messages and up to 5GB for the queue or topic as a whole. Ideally you should be looking to keep your messages as short as possible so they don’t clog up the queue or take too much time to move over the wire. If you need to include a large amount of information with a message then consider placing it in an external store such as Azure Blob Storage that is visible to both the sending and receiving application. A reference in the message can point to the data BLOB stored in the document database so it can be retrieved by the message recipient.

Explicitly managing connections

Azure client objects such as QueueClient and TopicClient are expensive and time-consuming to create. The also maintain active connections to Azure so you should be sparing over how many you create and cache them for as long as you may need them rather than re-creating them for every operation.

You should also bear this in mind when designing your queues, topics and subscriptions. If your messaging design is too diffused then your client applications will have to open and maintain too many separate client objects, creating unnecessary overhead.

Dead-letters and duplicates

A messaging client normally has to deal with a fair amount of “noise” in terms of unexpected, invalid or duplicated messages.

You will occasionally come across a message that cannot be processed for whatever reason. It may be in an unexpected format or contain information that causes an exception in your messaging client. If a message can’t be processed then it will sit on top of your queue indefinitely unless something is done to remove it.

Azure supports automatic “dead-lettering” where a message is moved to a separate dead-letter queue after a certain number of failed deliveries. However, this is not the end of the story as dead-letter messages contribute to the overall capacity of your queue. If too many accumulate then your queue will become full and stop accepting new messages. This puts the onus on you to provide a means of monitoring and clearing dead-letter queues.

A similar problem can occur with duplicate messages. Some domain designs will give rise to repetitive messages but sometimes this is a sign that a client application is stuck in a processing loop of some kind. Left unchecked this can fill a queue with junk unless duplicates are actively managed in some way. Azure provides automatic support for this by allowing you to specify a time window for detecting duplicate messages. With any messaging model you should give careful consideration as to the impact of duplicates and how frequently you would need to suppress them.

Developing a robust security model

When you first set up an Azure namespace you are provided with the “owner” identity which can be authenticated by a shared secret. This identity lets you create, send or receive anything in your namespace although you won’t necessarily want to give this level of freedom to your client applications. For example, you may want to limit access to particular subscriptions in order to restrict the information available to listening clients.

You can use the Access Control Service to define access control lists that can be applied to individual queues, topics and subscriptions as well as the namespace as a whole. It provides for flexible authorization model while authentication can be extended to accept federated tokens from a number of different token providers.

Monitoring and instrumentation

Azure Service Bus may be a cloud-hosted piece of infrastructure, but it is still something that needs to be monitored and maintained. The usage-based pricing model can have its downsides. If a client goes rogue and starts sending thousands of unnecessary messages it can start to get expensive if you don’t notice for a while.

You will need to keep a close eye on what is happening to your queues, topics and subscriptions to make sure that everything is behaving as expected. The Azure management portal provides an overview but isn’t much help when things are going wrong. The Service Bus Explorer project can help, though it is targeted more towards developers than sysadmins.

You will need a strategy for monitoring your use of Azure, preferably one based on a series of instrumentation metrics. These can be tied in to alerts that will trigger when your clients start behaving in unexpected ways. A little bit of investment in monitoring can save a fortune in Azure usage fees.

Finally… start by planning your messages and namespaces

When developing a robust mechanism for sending and receiving messages it can be easy to overlook the importance of  planning. The range and volume of different messages, queues, topics and subscriptions in your messaging model can have a significant impact on how you design your client applications.

For example, if you need a high throughput queue with a small number of senders and receivers then you can spawn numerous message senders working on parallel threads and leverage batching functionality to increase raw throughput. If you want lower latency then you should disable any batching so messages are sent immediately. In some cases you may want to separate particular messages onto dedicated queues to maximise the volume that can be pushed through the service bus. These are examples of the kinds of design decisions that are driven by the messaging model, so it is wise to try and develop a reasonable understanding of what you will be sending before developing your client application.

Filed under Azure, C#, Integration, Messaging.