5 Minute DevOps: The Value of Code Coverage
Code coverage is the measure of the amount of code being executed by tests. It’s an easy number to measure and a minimum value is frequently set as a goal. What is the actual value of code coverage?
True story, I was once in an organization where management told us that one of our annual goals was to ensure all applications had 90% code coverage. See, we were just starting to push testing in the area and they wanted to track our progress to make sure testing was happening. It was an interesting social experiment that I frequently see repeated in the industry. I learned a bit about people and measuring goals from that.
What was the outcome? Test coverage went up, naturally.
Let’s try this out. I’ve been given the task to write a function that will add two whole numbers. I have requirements that it should return the sum of two whole numbers but if a decimal value or a non-number is passed, then I need to return the JavaScript value for “Not a Number”. Should be easy.
So I have a function that will check both inputs to verify that they are evenly divisible by 1. If they are not, then it returns NaN. Since alpha characters are also not divisible by 1, those will be trapped too. Naturally, we have tests for this because I am also told I need over 90% test coverage. Let's run the coverage report!
code-coverage-ex (main ✔) » npm run coverageAdding two whole numbers
✓ should return the sum of two whole numbers
✓ should return NaN if one number is not a whole number2 passing (5ms)--------------------|---------|----------|---------|---------|
File | % Stmts | % Branch | % Funcs | % Lines |
--------------------|---------|----------|---------|---------|
All files | 100 | 100 | 100 | 100 |
addWholeNumbers.js | 100 | 100 | 100 | 100 |
--------------------|---------|----------|---------|---------|
There we go, 100% coverage for every statement, code branch, and function. The gold standard of quality.
I even have a nice report for the product owner and my manager.
I’m done and ready to hand this off to whoever will be doing support for me. I’m a developer, after all, support is someone else's problem.
Later that month, my change is delivered to production and customers start using my 100% covered code to add products to their shopping carts. Random quantity issues begin occurring and customers are getting angry about overcharges and massive over-shipment of products. Weird. We have really high code coverage. What went wrong?
Let’s log the output from the fuction
2 + 2 = 4
2.1 + 2 = NaN
A + 2 = NaN
2 + 2 = 22
So, the first three runs look great. What happened to the last run? Let’s look at the code.
console.log(`2 + 2 = ${addWholeNumbers(2, 2)}`);
console.log(`2.1 + 2 = ${addWholeNumbers(2.1, 2)}`);
console.log(`A + 2 = ${addWholeNumbers("A", 2)}`);
console.log(`2 + 2 = ${addWholeNumbers("2", 2)}`);
Hang on. That last run should have returned NaN too. We tested for strings
The tests above test very little that is meaningful to ensure the function works correctly. It was written to meet the goal of high code coverage. The customers didn’t care about that goal. The goal they cared about was that the function worked as intended. JavaScript is a very friendly language for rapid development. It typecasts for you. 2 + 2 = 4
but the +
operator is also the string concatenation operator so "2" + 2 = “22”
. Also, numbers in a string can have number operators used on them so the modulus operator in the function to verify a whole number will typecast a numeric string as well. This is an easy mistake that could be caught in a review of the tests if the size of the code review is small enough. All we need to do is add a test for this problem.
The scenario above assumes the best intentions but nieve testing skills. We can fix this by focusing on the actual goal, “we are confident we are meeting the customer’s needs” and then teaching people how to test.
Let’s cover another scenario that I’ve personally witnessed as an outcome of “you must have 90% code coverage!”. I was code reviewing some old code one day. Reviewing tests is really the first priority anyone should have. I came across something terrifying. The code had very high coverage and was completely untested. Using the function above as an example, the tests were similar to this:
it('Should add two whole numbers' () => {
addWholeNumbers(2, 2)
addWholeNumbers(1.1, 0)
})
This test will also report 100% code coverage. It doesn’t actually run any tests. If you are judging a team on code coverage, how many of your tests look like this?
Testing is what professional software developers do. Pro’s have pride in their work and want to deliver working solutions to business problems. However, everyone will adapt to the environment they are placed in or they will choose to move to another environment. If developers are pushed to meet dates and push out features as fast as possible, quality processes are squeezed out. If they are then told to improve code coverage, code coverage will go up. The only thing scarier than untested code is code tested with tests you don’t trust.
What is the value of code coverage reporting? To find untested code that should be tested. There should never be a “code converge goal”. Instead, we should focus on what the customer needs and use an actual quality process: reduce the size of changes, focus on improving the quality signal from the pipeline, and deliver changes rapidly to find out if our fully tested application is tested using bad assumptions.
Only our users can define quality. A test is only our prediction of what they think. Deliver so you can test the test.