8 March 2019
Sharing microservices across large organisations
Large organisations often try to achieve economies of scale through shared code initiatives. These days this means trying to share microservices between different business units. After all, if we’re all building small, autonomous service implementations, then surely it should be possible to share them between companies, divisions or countries?
The problem is that sharing microservices across organisational boundaries is much harder than it looks. There’s much more to it than simply publishing an API. There are numerous technical, operational, organisational and financial issues that need to be considered if you want to be successful.
A key debate is whether internally-developed services should be held to the same standards as commercial products. What are the advantages in consuming an internal service, beyond helping to achieve corporate economies of scale? Is everybody expected to behave like a SaaS supplier, or should we be setting the bar a little lower? Where might an acceptable trade-off be, e.g. forgoing a professional-grade service level agreement in return for a direct say in the roadmap?
Can the service guarantee any service levels in terms of expected up-time and quality of service? What are the backup and disaster recovery arrangements? What happens if the service levels are not met – are there any penalties or refunds?
The expectations around an SLA should not be any different just because the service is provided internally. This can be difficult for large organisations where individual business units aren’t accustomed to managing a customer\supplier relationship with each other.
If you have an SLA, then you will also need some clarity around support expectations. This can be difficult when each part of the organisation has their own help desk arrangements.
The main difficulty comes when seeking to triage problems that may be manifested across numerous different microservices. If ownership of these services is spread between different organisational units it can undermine the clarity and ownership of issues. In the worst case you can have issued that are passed between different organisational silos without any prospect of resolution.
If you are using an internally-provided service it may be reasonable to expect greater visibility over the implementation detail. Consmers may want access to production monitoring so they can see what’s going on under the hood. If internal policits allow, there could be scope for reviewing the architecture or even having a look at the source code.
These are all demands that would seem intrusive to an external supplier, but they could be regarded as a benefit that can only be provided by internal provision.
Authentication and identity
If services are going to collaborate then some form of common authentication and identity mechanism is required, preferably one that enables single sign-on.
Ultimately, you’ll need to adopt some common standards, but this can be fraught with disagreement. A protocol like OAuth 2.0 is vague around the edges and leaves plenty of room for interpretation. What often happens is that a centralised implementation is selected as the standard while other other services gradually (and reluctantly) conform to it.
Note that authorization is a separate debate, i.e. how you determine what authenticated identities are allowed to do. It is very difficult to assert a common permissions model between organisations without getting weighed down by complexity. You might be able to agree how permissions should be represented in an identity model (e.g. claims in an access token), but the interpretation of these may be best left to individual products.
If you are trying to share services between countries and markets there may be significant differences in security standards and requirements. What level of compliance is required around data handling? Will the service be able to provide regular pen test results and static code scans? Are there any processes in place for incident management and reporting? A clear baseline is required, but not one that is not so onerous that it prevents any service collaboration.
Communication and integration
You will need to agree some “rules of the road” in terms of how interfaces are exposed and consumed.
It makes sense to standardise in this area, but it can be tricky to gain consensus between participants. You could expose REST or gRPC but that tends to give rise to widespread temporal coupling between services. Event-based messaging can help to reduce coupling but it tends to make for more complexity. Even if you decide on the protocol for an API there can still be much disagreement around the style of implementation.
Endpoints and API management
API management can be a further complication. Are you going to allow another organisation to control policies such as rate limiting and quotas? What about managing a developer portal and on-boarding new users? Should each business unit have separate API management instances or would you consolidate this infrastructure?
Discoverability is a related issue. If we’re going to publish API implementations across the organisation, how do we ensure that consumers can find the API, figure out its intent and know who to get in touch with?
The developer experience
Any external service is normally expected to provide a reasonable set of up-to-date documentation along with samples to get developers up and running. Dropping Swagger on top of a bunch of APIs is not really enough. Services may also be expected to provide sandbox environments to allow developers to build integrations. Facilities that help developers debug problems are also to be expected, along with meaningful error messages and maybe even access to project developers.
This all requires fairly mature, developer-orientated platforms and processes. The reality is usually a a lot more basic, even for public-facing APIs with external customers. It is less likely that an organisation will invest in this kind of infrastructure just to accommodate a small number of internal collaborators.
Adopting a shared service also means taking on somebody else’s vision for its long-term roadmap. Has this been clearly stated and does it fit in with what you want?
Internally-shared services suggest more of a partnership than externally procured products. It is reasonable to expect some visibility over the development backlog. You could even expect a seat at the table when features are being prioritised and roadmaps are being drawn.
Care needs to be taken over API versioning, preferably to avoid breaking change and ensure some level of backwards compatibility. External-facing services will have a customer base that tends to prevents sudden change, though this discipline might not be in placer for internally provision.
Part of any contract between service and consumer should be a clear understanding of how to manage change. This should include a definition of what constitutes breaking change and how the service will implement it. A service can’t be expected to keep older versions online forever, but it should at least provide a lengthy deprecation period.
The more widely a service is shared, the more varied the consumer requirements become. In general, some level of extensibility is helpful as it can allow consumers to add in the missing pieces rather than having to get them added to another organisation’s roadmap.
If services are shared between countries the requirements for localisation are likely to be more onerous. This will include system messages and errors – generally the last piece that anybody ever considers localising. There may also be a demand for more flexbility over branding.
Finally, and probably most important of all, who pays for this? Assuming you can agree a financial model for compensating the organisation providing the service, should this be competitive with externally-available services? What happens when a consumer wants to stop using the service? Are they allowed to withdraw or is there a notice period?
This brings us back to considering the trading-off the benfits to the consumer. Any charge-back has to be very carefully calibrated alongside all the other aspects of service delivery. After all, it would be difficult to justify charging a significant fee for am external service with a limited SLA, oblique roadmap and patchy documentation.