22 June 2019
Monorepos. For when life isn’t already complicated enough.
Monorepos are usually associated with the organisation-wide repository used by the likes of Google and Facebook. Is this another case of big-tech inspired cargo cultism, where the techniques used in galactic-scale technical businesses are adopted in the hope that some of that magic might rub off?
Google have published a much-cited paper that claims many advantages for using a single repository. It provides unified approach to versioning and ensures one "source of truth" for their code base. It facilitates code sharing and collaboration between teams. The code structure is more visible, and dependencies are easier to manage. You can make widespread atomic changes to enable large-scale refactoring across the code base.
It’s important to note that Google are trying to solve engineering problems on a truly grand scale – i.e. billions of lines of code spread over millions of source files that are worked on by tens of thousands of developers. This isn’t the scale of problem that most development shops need to address.
It’s also important to realise that this isn’t a binary choice between a galactic-scale, organisation-wide repository or an abundance of tiny, single-component repos. All it means is that you are storing multiple deployable components in the same repository. For example, project-scale monorepos are widely used for front-end frameworks such as React or Angular as these tend to be composed of numerous small components.
Visibility and dependency
If you follow the dogma that each repository should contain a single deployable unit, it’s surprising how quickly sprawl can set in. Once teams start creating repositories in a component or microservice based architecture you can easily be overwhelmed with hundreds of separate repositories.
You don’t necessarily address this problem just by making it more visible in a single, shared repository. You will still accumulate mess. Abandoned projects will tend to hang around in the code base for longer than they should. This is an organisational and management problem as much as a technical challenge. You will need a clear process of ownership and review to ensure that any noise in the code base is kept to a minimum.
Using a monorepo might make dependencies more visible, but it doesn’t make them go away. There will still be consequences to making breaking changes. You will need to ensure that every consumer is updated in response to the change. This can get nasty if you end up having to recompile everything just to be sure that your dependencies are working out properly.
Increased visibility can even lead to more dependencies. If you reduce the cost of managing dependencies engineers may be more likely to add them. Teams are less incentivised to create autonomous services with well-defined APIs and well-understood dependency graphs.
One side-effect noted by Google was one of unintended dependency as visibility is not always a blessing. Some teams stopped producing documentation, expecting engineers to read the code instead. This started to backfire as engineers became too acquainted with the implementation detail in services. Teams found themselves having to support features that had been inadvertently exposed to developers.
Google did take a more process-based approach to managing dependencies in a monorepo. They developed tooling to detect and evaluate dependencies, mark APIs as deprecated and remove dead code. There have also been changes in engineering practices to encourage more explicit management of public interfaces and dependencies.
Single commit and build
A monorepo does offer the prospect of being able to commit, build and deploy all your code at once. It’s much easier to assert a consistent build and release process across a single repository. You can realise continuous integration without having to manage all the messiness around incompatible components.
The reality is that you can’t build an entire organisation’s code base and run all the tests in response to every check-in. A large monorepo inevitably requires a sophisticated build system (e.g. Bazel or Buck) that can track internal dependencies and cache any artefacts that don’t need to be rebuilt. This increases complexity and inevitably places a greater burden on your build infrastructure.
Despite this complexity, much value is attached to having a single commit that describes the current state of the world. For example, it could make large-scale refactoring easier to implement in a single, atomic commit. That said, if you’re regularly having to commit changes to multiple services you may need to consider building in more autonomy to your component. This is a problem that could be solved through architecture rather than source code storage.
Reducing code duplication
There is an argument that a monorepo reduces code duplication by making it easier for developers to navigate between projects and leverage existing libraries. This does betray a naïve view of code reuse initiatives. You need a lot more than improved visibility and easier navigation to establish genuinely reusable components. They need to be explicitly curated and managed, as reusable code rarely springs fully formed from normal development projects.
The ugly truth here is that an enormous repository becomes too unwieldy for developers to use in its entirety. Virtual file system tools (e.g. GVFS) are often used to allow developers to work with a proportion of the repository. This can defeat the object of using a monorepo if engineers are dividing it into separate virtual repositories. Once this happens you may also need to provide extra tools to help engineers to search and discover code across the entire repository.
Engineering culture is relevant here. Developers already have tooling available to them to search across repositories. If they are not inclined to use it then pushing all the code into a single repository will not solve the problem. All you’re doing here is taking the horse to a different body of water.
Fixating on a solution?
Monorepos can’t be considered in isolation as they address problems that are closely linked to both engineering culture and architecture. A well-evolved engineering culture can accelerate development through techniques such as trunk-based development, a consistent process of code review and clear code ownership. Sound architectural design can do much to address dependency and lower the cost of change.
The only thing that switching to a monorepo really guarantees is extra complexity. A whole ecosystem of tooling has sprung up to try and plug some of the functional gaps that monorepos present at scale. In many cases you may be better off tackling these problems with more basic improvements in process, culture or architecture.