Continuous Integration systems are often misunderstood by software organizations. Companies tend to be given insufficient resources, budget and staff for CI systems. The importance and impact these systems have on software development tends to be severely under-estimated.
The reality of the situation is that companies practicing scrum, test driven development, or other best practices introducing a heavy reliance on test automation will quickly face performance bottlenecks if they do not sufficiently fund, architect and value the CI systems that execute build, test and deploy workflows. Following are common failures and suggested solutions to building a scalable CI system.
There are two common provisioning and utilization failures that occur.
One is over-utilized, and under-provisioned which introduces a time bottleneck. The other is under-utilized and over-provisioned with introduces a cost bottleneck. By far, the greater of the two evils is under-provisioned (too few compute resources) which leads to either extended delays in getting build and test results, or batching of changes into groups built and tested together, or both. The lesser of the two evils is to pay too much money for hardware and computing resources that sit idle and unused. Both of these scenarios exist at pretty much every single company I have ever worked for in various forms of build and test continuous integration systems.
Over Utilized, Under Provisioned
Over Provisioned, Under Utilized
These issues spawn huge costs further down the software development chain. Developer time is costly, and it is used inefficiently waiting for build and test results. Failures that are introduced and run in batched sets are difficult to understand and resolve. Complex systems are created to try and mitigate these issues, but with limited success. Human beings are hired to spend the majority of their time reviewing test failures and assigning to appropriate teams when this action could have been automated with machines if the system was configured and scaled out properly to run build and tests against each change.
Both of these failures can be addressed with a dynamic cloud od nodes that are created, used and destroyed according to job queue size.
Virtual Machine State Management
Anther common failure is related to virtual machine state management.
Attempting to configure operating systems manually, by hand, does not scale, but is a common mistake. Attempting to reuse the same operating system over several runs is another common mistake. Both of these process and design flaws introduce error-prone, snowflake machines that are stateful and provide non-deterministic platforms. This introduces failures that need to be consider by human beings and thus introduces more inefficiency and inability to scale with the business.
These flaws can be addressed with machine-as-code template management. Virtual machines can be described as code that is kept in a source control system. All vm configuration changes must be revisioned and tested before being deployed out into production. Every build and test job runs on a totally clean, hermetically-sealed operating system environment. There are many excellent frameworks to pick from. Here are some simple fabric and ansible code snippets to turn wireless off, as an example.
@task def disable_wireless():
run('networksetup -setairportpower en1 off')
- hosts: osx_nodes
- name: turn wireless off
shell: networksetup -setairportpower en1 off
Scalable Self-Service Build and Test with YAML
A third issue is trying to administer the system in a centralized way. This doesn't scale for large organizations as they grow.
This can be addressed by providing a self-service model. Discrete teams indicate how their code should be built and tested with a yaml configuration file that is checked into the repo itself and is versioned along with their code. If something changes with the way the code is build or tested, the developers simply update this file and check it in. Here is an example yaml config file for a simple python project:
# command to install dependencies
- pip install .
- pip install -r requirements.txt
# command to run tests