26 February 2017

Handling Protocol Buffers backwards compatibility between versions 2 and 3 using C#

If you are using version 2 of Google’s Protocol Buffers serialization format then you will inevitably be pushed towards upgrading your messages to Proto3, particularly if you want a client that supports .Net Standard and .Net Core.

If you have been using the format widely then an upgrade is likely to be a gradual process. It won’t be practical to migrate every client and server simultaneously so you may have to get accustomed to running a mixed environment where two different versions of Protocol Buffers are in use at the same time. This can be done, but it does involve a few speed bumps.

Although the basic format of proto files remains unchanged in terms of fields and data types, the big change is that mandatory fields and defaults have been removed from Proto3. This means that a Proto2 contract cannot necessarily be ported directly to Proto3 without changes in both syntax and contract.

You won’t miss mandatory fields and defaults. Eventually.

Google’s rational for removing mandatory fields is twofold: firstly, it complicated the semantics around fields as developers were making an unnecessary check for absent values before accessing the fields; secondly, it makes the library easier to implement and more accessible to implementation communities.

Another issue with mandatory fields and defaults is that they can offer a misleading and unenforceable contract. Marking a field as mandatory encourages a client to assume it will be valid, which can lead to weak validation code. For example, values such as zero and a blank string will pass mandatory checks even though they may not be meaningful.

Making every field optional provides a clearer contract to clients. They are explicitly responsible for checking that every field has been populated with something valid.

Message equivalence

It’s important to bear in mind that this is about changes to the APIs rather than the binary data that is transmitted over the wire.

The byte stream remains the same between the two versions. Using Google’s C#-based clients it is possible to freely exchange messages based on different versions of protos so long as the fields and data types remain in the same order. Structures such as mandatory fields and defaults are not included in the serialized binary stream so are irrelevant to the library parsers.

That said, there is a practical difficulty when consuming serialized version 3 messages in a version 2 client. If a field is mandatory deserialization will fail if the value is missing from the byte stream. There is no way around this – you will need to ween your version 2 clients off mandatory fields before you can upgrade existing messages to version 3.

Removing mandatory fields and defaults does change the nature of the underlying contract represented by the proto definition. If you do have more than a smattering of mandatory fields and default values then a managed migration is probably a safer alternative to running different versions side-by-side.

Testing for fields in proto 3 messages

The proto2 API provides a “Has” method for every optional field that makes it easy to test whether a value has been explicitly supplied. This has not been carried over to the Proto3 format, though there is a workaround that makes it possible.

The example below shows a Proto2 message containing a single field with a default value of ten.

package ExampleMessages;
 
message DefaultExample
{
    optional int32 Value = 1 [default = 10];
}

The Proto3 equivalent is pretty straightforward:

syntax = "proto3";
 
package ExampleMessages;
 
message DefaultExample
{
    int32 Value = 1;
}

The problem here is that the resultant API does not contain any way of detecting whether or not the Value field was supplied by the client. There is a workaround here as you can wrap the field in a oneof clause as shown below:

syntax = "proto3";
 
package ExampleMessages;
 
message DefaultExample
{
    oneof Value_present 
    {
        int32 Value = 1;
    }
}

This will compile to that an extra ValuePresentCase method is supplied alongside the Value property. This can be used to detect whether the value was supplied or not shown in the C# example below:

var data = ExampleMessages.DefaultExample.Parser.ParseFrom(bytes);
 
if (data.ValuePresentCase == ExampleMessages.DefaultExample.ValuePresentOneofCase.Value)
{
    // Value was supplied
}

No more immutability

There other main aspect of the API that has been dropped for version 3 is the builder pattern and immutability of messages. The builder allowed you to make a clear statement of intent about the data before it is transmitted. It’s a clear guarantee that playtime is over so it’s time to transmit the data.

Alas, this has been removed from version 3 so a protocol buffer behaves pretty much like any other POCO. This doesn’t affect compatibility but it does represent another change in the tone of the contract provided by protocol buffers when you upgrade to Proto3.

Filed under C#, Integration, Messaging, Microservices, SOA.