Don't use test coverage as a target

Measuring test code coverage can be a useful technique for finding where the gaps are in your automated tests. The problem is that coverage is often used as a target, where it can be misleading, encourage the wrong behaviours and even distort development.

This is not to dismiss the importance of test coverage. Any new code written without tests should be regarded as incomplete. Ideally, you should be writing tests before implementing any functionality. The problem comes when arbitrary coverage targets are applied, especially to existing code bases.

What are we trying to measure?

Agile development tends not to throw up usable metrics for management teams to feed on. The focus on working software provides an intangible goal that tends to resist standardisation. Test coverage is one of those statistics that is often picked out as something that can be measured and therefore improved.

The problem is that code coverage itself does not measure anything useful.

A high coverage percentage might look good on a dashboard, but it does not measure the overall quality of the system, neither does it say anything about the overall performance of your testing effort. After all, you can reach very high coverage numbers with some very low-quality tests.

Setting code coverage targets does not enforce any good programmer behaviours. In fact, it can encourage some bad ones. If you want to nurture good engineering practices then you're better off using a combination of coaching, training and techniques that enhance discipline, such as pair programming.

Engineers can usually be relied upon to find ways to game a set of targets. Coverage percentages tend to inspire anaemic tests that execute code without asserting anything meaningful. To the outside observer - and tools such as SonarQube – the code will appear "covered", even though the test isn't adding any value.

No tests can be better than bad tests. Writing unit tests can be a fine art. It's easy to fall into the trap of over-abstraction where tests do little more than confirm a language's control flow statements. A bloated test suite increases the amount of code under management and can slow down development, especially if fragile tests need to be rewritten in response to feature changes.

Each code base has different testing requirements. You can't define a single, uniform target or compare coverage between different code bases. For example, a lot of code in monolithic applications does little more than moving data between different layers and may not be in urgent need of unit tests.

Applying tests to existing systems

Code coverage is particularly difficult for existing or legacy code bases.

Part of the challenge is in the nature of unit tests. They involve breaking down functionality into self-contained "units" that do not have any external dependencies. This is very difficult to apply to a legacy code base. If code has not been designed from the ground up to be unit testable then you will struggle to apply any meaningful tests retrospectively.

This is where chasing code coverage can become damaging as it distorts refactoring efforts towards "testability". Some would argue that this is the optimal way to deal with legacy systemsa. Indeed, it is pretty much the central tenant of Michael Flowers' often cited book on working with legacy systems. If your code base doesn't lend itself to this kind of refactoring, then you're out of luck.

Applying tests to existing code bases is hard. It usually involves far more than disentangling a few external dependencies. Isolating code into meaningful units generally involves widespread re-organising of code that can quickly escalate into a major rewrite.

Remodelling a code base for test automation can consume any precious time a team is given to tackle technical debt. The quest for testability can become pathological so that other concerns are ignored or at least diminished, such as scalability, resilience and security. In this sense, the quest for code coverage can become counter-productive if teams are encouraged to forsake other concerns in pursuit of a coverage target.

What should we measure instead?

Rather than measuring test coverage, it makes more sense to measure the outcomes that improved coverage is supposed to influence. You've probably got enough tests when defects tend not to escape into production and the development team are confident about making changes without regression problems.

A more meaningful measure of quality might be found in the tracking of defects that escape into production. The rate of deployment can provide a practical view of the stability of the system, especially if regular hot-fixes or patches are being released. The overall time it takes features to make it into production can help to indicate whether development is being undermined by regression.

All of these provide a more direct indicator of development health. Code coverage can be a useful diagnostic tool to expose gaps in testing, but it just doesn't mean anything without this more practical context.