22 July 2017

The poor man’s Cassandra: scaling Azure Table Storage to maximise throughput.

Azure table storage isn’t always associated with high throughput workloads – by “high throughput” I mean tens of thousands of records per second. This kind of scale is normally associated with technologies such as Apache Cassandra that are optimised to cope with large data streams.

This muscular throughput does come at a cost of complexity and cost. Running a production instance of Cassandra is not for the faint-hearted. Keeping the right number of nodes up and running to support your workload can be non-trivial and the operational costs can match the size of your data.

In theory you can push up to 20,000 records a second into Azure Tables. This is peanuts compared with some of the numbers that have been thrown at large Cassandra rigs. For example, LinkedIn have pushed throughput for sixty Cassandra nodes to as far as a million records a second. Still, 20k is still not a bad cap for such a cheap service.

And cheap is the operative word here. Pricing for Azure Tables is based on storage with a very small fee for the number of operations that you use. Pushing in a daily load of 100m 1KB records costs as little as £80 a month with extra for storage. Hence the “poor man’s Cassandra” – it’s a great option if you can fit your data into it and live with the limitations this imposes.

The small print around scaling Azure Tables

Azure Tables uses a variable model for throughput performance that depends very much on data you are pushing in. The main limitation is that it can only support 2,000 operations per second per partition. Azure has a fixed primary key based on a row identifier and partition key that you define for each record. Data is split into separate physical partitions on the fly in response to incoming data. This means that you are responsible for distributing any incoming data evenly across partition keys to avoid getting a “hot” partition that is being hit with too much load.

The arrangement of data across partitions affects query performance. Retrieving a records by their primary key is always very fast but Azure Tables resorts to table scans to find any data that is not in the same partition. Each scanned row counts towards that 20,000 operations per second limit. Batch operations such as transactions that affect more than one record are also limited to records in the same partition.

This gives rise to a fundamental trade-off between throughput and query flexibility. Distributing data across as many partitions as possible makes it much easier load large amounts of data without hitting partition limits. This can come at the expense of your ability to query the data or perform batch operations.

The best strategy for maximising throughput is to make your partition key unique for every row. This pretty much guarantees that you won’t have a “hot” partition, but any queries will have to scan an entire table to fetch any results which each row counting towards your 20,000 operations a second limit. This effectively rules out viable query functionality on very large tables.

Getting creative with table organisation

Striking the right balance depends very much on your data and query requirements. Troy Hunt described a good example of scaled partitions based on uploading hundreds of millions of email-based records for his Have I been Pwned? service. Given that he only needed to support retrieving records by email address he found that organising partitions by domain gave him a natural distribution that allowed bulk uploads to avoid partition scaling limits.

To maximise throughput you may need to be creative about the way you organise data. For example, there is no limit to the number of tables you can have in a single account, so you can use table names as a form of broad partition allowing for very fine grained partition keys.

Another approach is to take a multi-tenant view and segment your data across different storage accounts. Azure allows you to provision up to 200 storage accounts, each with a separate limit of 20,000 records per second. This gives you considerable scope for segmenting your data if you can live with the complete logical and physical separation this involves.

Optimising the client

You should also try and make your client applications as efficient as possible. Nagling is a TCP optimization that is designed to reduce network congestion by rolling small requests into larger TCP segments. This can cause a bottleneck for lots of small requests so maybe worth switching off at application start-up as shown below:

// Disable Nagling - call this before opening and clients
ServicePointManager.UseNagleAlgorithm = false;

Assuming that you are making requests in parallel you can also increase the number of concurrent connections maintained by the client from the default of two:

// Increase the connection limit to the number of threads
ServicePointManager.DefaultConnectionLimit = _parallelRequests;

Proximity is also significant when you’re trying to scale up Azure Table Storage. If you’re able to keep your consuming applications in the same Azure location as your storage account then latency is reduced to less than 10ms for calls to retrieve a single row. Perhaps more importantly, you’re not charged for bandwidth usage for transfers within a data centre, making a co-located solution much more cost effective.

Buffering and shaping with queues

Azure table Storage does not like sudden peaks of demand and it tends to respond best when requests purr along at a predictable rate. Anything you can do to shape throughput and smooth out peaks will help to make performance more predictable and avoid spikes in response times.

Size is also significant as a varied stream of data can give rise to wildly fluctuating response times. Although Microsoft publish limits of 20,000 records per second they must be talking about pretty small entities. If you have a fifty-column monster entity then you’ll get considerably fewer than that through every second. Microsoft do provide a detailed explanation around how storage space is used in Azure Tables which can be helpful in planning your table design.

Queues can help to regulate throughput by buffering the data that is pushed through Azure tables. You can also use them to segment data so that similar types of record are processed together, avoiding the kind spikes that undermine overall performance.

Queues can also help to ensure resilience. Although a retry policy is pretty essential when working with Azure Tables, you will eventually run out of road for large peaks and start getting “ServiceUnavailable” errors. Managing your workload in a queue does allow you the luxury of backing off entirely without the risk of shedding any data.

How does Azure Tables compare?

You can scale Azure Tables if you are able to live with the associated limitations. It is extremely cheap compared to high-scaling cloud databases such as DynamoDB and CosmosDB.

CosmosDB looks good value on paper, with burst scaling that is charged by the hour along with impressive guarantees of availability, low latency and global distribution. These hourly costs can really add up if you want to maintain a high level of throughput. Unless you can guarantee that data will be evenly distributed across the partitions that CosmosDB manages then you will have to provision a fair amount of headroom to ensure that individual partitions are not overworked.

When comparing costs it’s important not to fall into the trap of comparing apples and oranges. CosmosDB provides a low-latency, always available, high-throughput, globally distributed database infrastructure that on premise installations will struggle to match. Azure Tables are great value, but you get what you pay for: you will have to live with variable latency, restricted querying capability and more limited failover options.

Filed under Azure, Design patterns.