5 Minute DevOps: Testing 101
Continuous delivery is a way of working to reduce cost and waste, not just build/deploy automation. For CD to be effective, we must identify and fix poor quality as early in the process as possible, never let poor quality flow downstream, and do this efficiently and quickly. Central to this is the discipline of continuous integration. CI requires that you have very high confidence that code is working before it is checked in and that those check-ins are happening at least daily, and tested on the trunk with the CI server. This requires efficient and effective testing as a core activity by everyone on the team. However, getting started can be daunting. There’s so much conflicting information available, much of it bad. People new to testing can get overwhelmed by the amount of information out there and tend to expect it to be complicated. There is also no common language for testing patterns. Ask three people what an “integration test” is and you’ll get four answers.
The reality is that testing isn’t that hard, it’s just code written for a different purpose. All that’s required is to pick some good source material, remember a few guidelines, understand the goals, and practice writing tests.
Good Sources
These are good places to start:
Now that we have some references that don’t suck, let’s hit the high points.
Some terms
- White-box testing: Testing the implementation. Inside out.
- Black-box testing: Testing the behavior. Outside in.
These are relative to what you’re testing, the system under test (SUT). If you’re testing a complicated function to verify it’s working correctly by sending inputs and asserting the output is as expected, then it’s a black-box test. If you are testing a workflow through a service relying only on tests that validate each method, you are white-box testing and have a very brittle test architecture that will make refactoring toilsome.
Goals
- We want to be confident that we can deploy this change without breaking something. This is very much different from 100% certainty. The level of confidence required is based on risk.
- We want most (80% — 90%) of that confidence before we push that change to version control. This means we build tests that can run locally, off-network, and quickly.
Not Goals
- Code coverage. Do not make code coverage a goal. Code coverage is an outcome of good practice, not an indicator that good practice exists. High code coverage with crappy tests is worse than low coverage you trust. At least with low coverage, you KNOW what’s not tested. Crappy tests hide that information from you. Use code coverage reports to find what really needs to be tested or to find dead code, but don’t fall for the vanity metric.
- Comprehensive system tests. Requiring a full system test for every deploy is an indicator of poor application or test architecture. This isn't to say that end to end tests aren’t useful, but they should be minimal. Creating one of these monsters isn’t an achievement. It’s just money that could have been better spent on more efficient tests.
Good Practices
- Make the test fail before you make it pass. Writing tests that cannot fail is a pretty common error. Don’t suffer from misplaced confidence. This still bites me sometimes. Some would say I’m advocating TDD. Yes, I’m advocating TDD. It reduces rework to make code testable and speeds the next feature. Stop debating religious wars and test first unless you have no need to ever change the application again.
- Prefer black-box testing. Test WHAT is happening, not HOW it’s happening. It may be useful to use white-box tests as temporary scaffolding while you make a change, but refactor them to black-box or you will tightly couple the tests to the implementation. You should be able to freely refactor code without changes to tests. If you can’t, fix your test architecture.
- Prefer testing workflows instead of using lots of unit tests. Fowler calls these “sociable unit tests”. Kent Dodds calls this “integration testing” where you integrate several units into a workflow, but do not cross network interfaces. I recommend this article on it: Write tests. Not too many. Mostly integration. — Kent C. Dodds
- Don’t over-test. If your SUT depends on other components, make sure your test does not test the behavior of the other component. If you have behavior tested at a lower level, don’t retest that behavior with a test with a larger scope. Two tests testing the same thing means both need maintaining and one is probably wrong.
- Don’t under test. Unit tests are not enough. Full system tests used as the primary form of testing cannot fully traverse all of the paths without excessive cost.
- Record/replay testing tools are heroin. They are the Dark Side. Easier, quicker, more seductive. They are good for exactly one use-case, baselining behavior of untested legacy code while you refactor and test correctly. They do not replace proper testing and will become a constraint.
- Use TDD. TDD is a design tool that results in cleaner code and better tests and is the best way to get good at testing quickly. You’ll spend less time in maintenance mode if you learn this discipline. You can rant and rave all you want about religious wars, but you’ll still be spending more effort on the same outcomes if you assume it’s BS.
- Terminate flakiness with extreme prejudice. Flaky tests that randomly fail will corrupt your quality feedback while also costing time and money. Don’t be afraid to deactivate them and write better replacements. Some tests inherently tend to be flaky. Tests that require a database or another external component are an example. Minimize their use and keep the flaky tests out of your CD pipelines.
- Don’t use humans for robot work. There is one use case for manual testing, testing anything not based on rules. Exploratory testing is a prime example. We need evil human brains to think outside the box and explore what might break that we didn’t consider. If humans are using a script to test they are being used as really slow, expensive, and unrepeatable robots.
- Don’t assume the requirements are correct. This is one of the main reasons for CD, to get a small change to the end-user to find out how wrong we are. Therefore, a test written to meet the requirements can still be wrong. Start with defining the requirements with declarative, testable acceptance criteria, then test the tests by getting user feedback for small changes ASAP. The faster you can do this, the less money you will burn being wrong.
This is just the start and really only focused on the basics needed for CI. The book “xUnit Test Patterns” is nearly 900 pages long and doesn’t cover everything. There are many other kinds of tests that might be required: contract, performance, resilience, usability, accessibility, etc. If you’re trying to do CD, dig into testing. If you say you’re doing CD and aren’t focused on efficient and effective detection of poor quality early in the flow, are you sure you’re doing CD?