Using architectural “fitness functions” as a guide to system design

Fitness functions are used in genetic programming to assess how close a design solution is to meeting a set of aims. They are applied to iterative simulations so badly performing solutions can be removed and the output can be guided towards an optimal solution.

This is an idea that can be applied to the design of software systems. An evolutionary approach to architecture suggests that you can assess the suitability of a technical solution using objective and repeatable tests. These tests balance the need for rigor around architectural design with the need to support rapid change. You can evolve an architecture through iterative experimentation, using fitness functions to guide the way.

Identifying these tests as “fitness functions” may be stretching the evolutionary metaphor a little. You are not using functions to choose between different candidate solutions in a simulation of natural selection. You are assessing current state. The application of fitness functions has less to do with iteratively rejecting solutions and more to do with using metrics to guide future development effort.

Another problem is that it can be difficult to come up with objective and repeatable tests that measure the right things. In genetic programming a lot of investment is put into the definitions of fitness functions to make sure they correlate with design goals. Measuring the wrong thing will cause the process to converge on an inappropriate solution.

The same applies to fitness functions in an architectural setting. How do you measure architectural qualities such as coupling and cohesion? Are these even the right things to measure? In the search for meaningful metrics there is a risk that you end up groping for whatever comes conveniently to hand.

The problem with observing code

Given that code is a tangible artefact it can be tempting to define architectural fitness functions that focus on the structure of code. Test frameworks also lend themselves to creating clear, atomic tests that can be incorporated into the build process.

You can use static analysis tools such as SonarQube to provide some indication of the general complexity of code. ArchUnit goes further by allowing you to enforce code structure rules as JUnit test assertions that can be incorporated in continuous integration pipeline. This lets you define package structure, control package and class dependencies, regulate the flow between layers of an application and identify circular dependencies.

This approach is quite limited as it takes a very narrow view of the system. A full picture of architectural health needs to take different dimensions into account beyond the technical implementation. You also need to consider aspects such as performance, security, data, operability and integration, all of which can be difficult to assess through an automated test.

Another problem is that you tend to get what you measure. Fitness functions based on metrics can very quickly become perceived as targets. This encourages engineering teams to “game” them to create an impression of architectural health. For example, setting a target for unit test coverage encourages developers to create meaningless wrapper tests that do little more than bump up the coverage statistics.

Focusing on what matters

Perhaps architectural fitness isn’t something that can be measured by directly observing code or design. The real proof of architectural fitness comes with derived measures that describe how the system is being used and whether it is meeting expectations.

This can include more commercially-orientated tests that reflect business priorities as much as technical implementation, i.e.

  • How long does it take to deliver a feature, from conception to release?
  • How often are deployments being made and how many of them fail?
  • How long does it take to on-board a new customer once a sale has been made?
  • How many new support incidents are being recieved?
  • How much unscheduled down-time is going on?

There can be disconnect between the more tangible technical tests and the messier real world of commercial priorities and users. It’s worth noting that discussions around fitness functions rarely mention areas such as user experience or customer satisfaction. This could be because these are often intangible concepts that are difficult to measure. They certainly cannot be implemented as an automated test or put on a dashboard. This tends to encourage an overly internal, technical focus on architectural assessment which risks losing sight of things that really matter.

Avoiding metric-driven development

Are architectural fitness functions anything more than glorified metrics? That’s not really the point. It’s the way they are used that defines them. They can provide a base for a more iterative approach to architecture, helping to direct an evolving design towards a desired set of outcomes. It’s this support for iterative experimentation that sets them apart.

This does all rather hinge on coming up with a set of functions that accurately describe the desired outcome. The risk is that you tip over into a kind of metric-driven development where priorities are distorted by a narrow set of measurable criteria.

You also need to be pragmatic in the way they are applied. There will be trade-offs and conflicts between different dimensions of a system. You also need to review functions for continued relevance. After all, as with any set of requirements, your understanding of what is important to a system will evolve as you develop it.