8 February 2015

Protocol buffers for .Net: protobuf-net vs protobuf-csharp-port

Google’s open source serialization format is an efficient way of passing platform-independent and version-tolerant data between end-points. It has become Google’s lingua franca with tens of thousands of separate data definitions being used across numerous RPC and data storage systems.

Protocol buffers define data structures in simple text-based .proto files. In most implementations a compiler is used to derive the code that represents the data and moves it to and from a binary data stream.

Why use protocol buffers?

Protocol buffers are efficient as they give rise to much smaller payloads than JSON or XML. They provide a single means of describing data that can in turn be compiled by different platform implementations. This allows them to act as a language-neutral and platform-neutral way of describing data.

However, one of the main draws is the version tolerance. Protocol buffers are genuinely forwards and backwards compatible. You can add, rename or remove fields without breaking deserialization. You can even rename the class without breaking anything. This gives you an enormous amount of wriggle room in terms of managing change in your data definitions.

This is particularly useful in a more distributed environment where it is close to impossible to get every participant to use the same version of a schema. In a sense, your consumers automatically become tolerant readers that only choke on a payload if it is missing specific information that they are expecting.

The one potential downside is that it is not self-describing, i.e. you can’t determine the content of a message by inspecting the serialized payload. This isn’t much of a problem as it’s normally the metadata about the message which is more important in debugging terms than the actual data. You rarely have to inspect a message in flight.

Using protocol buffers in .Net

Google have only developed protocol buffers for Java, C++ and Python so two very different implementations have emerged in the .Net world. These implementations have much in common. For starters, they are both faithful implementations of protocol buffers developed by men who have obscenely high ratings on Stack Overflow.

  • Protobuf-csharp-port is written by Jon Skeet and is a faithful port of Google’s java implementation that uses similar command-line tooling. You create the data definitions in text-based .proto files and use a code generation tool to create the C# classes.
  • Protobuf-net is written by Marc Gravell and will be more familiar to .Net developers. The implementation uses .Net classes and attributes to define data rather than .proto files. Overall, the code resembles existing .Net serializers such as the DataContractSerializer.

In general, protobuf-net fits more snugly into an existing .Net domain model or code-first scenario. It doesn’t require the same fiddling around with compilation of .proto files which seasoned .Net developers may regard as a journey into command-line hell.  You can always generate .proto files from the data classes, though this is not quite the same thing as providing platform-independent data definitions.

If you’re working in a heterogeneous environment where you’ll be sharing cross-platform .proto files from the start then protobuf-csharp-port might be a more natural choice. Although you’ll be asserting the same model across the piece be aware that you will still have to insert.Net specific options to .proto files in order to specify things like .Net namespaces. These options are platform specific “noise” that is ignored by other implementations.

The code generation in protobuf-csharp-port does assert a very specific pattern based on builders and immutable data classes. This does have a certain appeal as it makes it clear to a consumer that you are dealing with a value rather than an object with a set of behaviours. The implication is that it is only the data that is being transmitted over the wire and anything else gets left behind.

Finally, it’s worth noting that John Skeet has nearly 750k reputation on Stack Overflow compared to Mark Gravell’s miserable 500k. One wonders where they find the time…

Filed under C#, Design patterns, Integration, SOA.