Kafka on Azure Event Hub – does it miss too many of the good bits?

Microsoft have added a Kafka façade to its Azure Event Hubs service, presumably in the hope of luring Kafka users onto its platform. This makes sense as the platforms have a lot in common.

Both platforms were designed to handle large-scale event streams involving multiple clients and producers. They do this by providing a distributed, partitioned and replicated commit logs. Messages are distributed onto topics with separately configurable retention periods. Partitioning across consumers is achieved through consumer groups. Clients are expected to manage their own cursor implementations top track their position in a stream.

As with all distributed systems, Kafka is decidedly non-trivial to maintain at scale. It takes quite a commitment to set up and manage a cluster in your own infrastructure. At the time of writing, hosted versions of Kafka are not shy about charging a premium to manage a cluster for you. The going rate appears to be in the region of $15-20 per day for an entry-level rig that can handle 1MB\sec throughput.

By contrast, Azure’s Event Hubs can provide similar throughput for just over a fiver. It also provides a pure PaaS solution that abstracts away all the detail of cluster management, providing global resilience, disaster recovery and scalability without any management or configuration burden. Throughput is provisioned by reserving resource units and you can control basic aspects such as message retention and the number of partitions for each topic.

Adding API façades to encourage migrations

The Kafka façade should allow applications to switch to Azure Event Hubs with minimal code changes. Kafka concepts such as partitions, consumer groups and offsets have direct equivalents in Azure. The provisioning of resource units to scale the instance throughput works in the same no matter what API you are using.

Presumably, the intention is to provide a migration path to Azure for anybody who has already made an investment in Kafka. A similar approach has been taken with CosmosDB, which has library support for MongoDB and Cassandra clients. They are not providing MongoDB and Cassandra instances, but merely providing a migration path to minimise the code changes required to move.

The problem with these implementations is that they are only partially realised. The client APIs may be the same, but under the hood all the provisioning and partitioning is still very much based on CosmosDB. In practical terms this means that your MongoDB sharding keys will work very differently, you lose control over how partitions are created and there are some limits on query capability.

The facades can also deny you access to native CosmosDB features. For example, the native SQL interface for CosmosDB lets you track resource unit usage by returning the units used in each operation. This is particularly useful for sizing loads and figuring out provisioning strategy, but it’s not available to the MongoDB and Cassandra façades.

Kafka limitations in Azure Event Hubs

Limitations also apply to Kafka through Azure Event Hubs. The APIs allow you to connect, send and receive, though some Kafka-specific features are missing. The omissions here feel a little more severe.

The main missing area is in Kafka’s support for “exactly once” delivery semantics. What this really means can be a little nuanced, but it supports a pipeline between producer and consumer where each message will only be processed once.  This allows the same result to be obtained from a stream even if there are network failures of consumer crashes during processing.

This relies on two key features, neither of which are supported in Azure Event Hub. For event producers, message sending can be made idempotent to avoid sending duplicate messages. This is achieved by assigning a unique identifier to the consumer and enforces a sequence for new messages. Note that if a producer fails completely then it is given a new unique number on restart, so the guarantee only exists for a single producer session.

When writing data Kafka supports atomic writes across topics and partitions, providing transactional semantics for groups of messages. On the consumer side, you can change the isolation level of a client to respect transactions or ignore them for better performance.

Idempotent producers and transactions aren’t the only omissions. You won’t be able to leverage Kafka Streams either, an API that allows you to transform data and build queryable projections from event streams.  You miss out on Kafka Connect, the connectivity framework that makes it much easier to transfer data between systems. There are also a few missing management features such as being able to add a partition to an existing topic, setting retention based on event size and using the HTTP-based Kafka REST API.

Taken together, this amounts to a neutered platform compared to a native Kafka implementation. Azure Event Hubs is just a streaming transport, lacking the more sophisticated delivery and processing features that are found in Kafka. It’s a great choice for simpler message-streaming scenarios, but may not be so useful if you have already made a significant investment in Kafka’s more advanced features.