5-Minute DevOps: Optimize for Sleep!
In a recent thread about quality ownership, I saw this comment from an agile coach that made me sad.
“So every time you walk in the door, quality is top of mind, and it’s completely part of the definition of done? I think not. Lack of quality and debt are very real.”
This person doesn’t work with competent management. That outcome is caused by hiring low-cost coders who have no product and operational responsibility. “Quality” is extra work for them because there are no incentives to focus on it. Ops drives quality, not specs.
I responded, “This is top of mind for me every single time.” Why? Trauma.
I talk frequently about continuous delivery and everything required to do it well, and I push back on destructive legacy practices that harm CD because, without CD, we are risking operations. I’ve had too many decades of sleepless nights supporting the systems I build. When I was first exposed to CD, it became an “ah ha!” moment for curing those sleepless nights. Now, I work backward from that goal and optimize for sleep.
Goal: When things break, be able to fix them quickly without workarounds and go back to sleep.
To meet this goal, everything is designed around operations.
- The delivery pipeline is feature 0. Before I write a feature, I need a way to test and deliver that feature. So the first test is to test that I can ship “Hello world.” When I’ve worked in places where this isn’t possible, I’ve delayed coding for as long as possible and then stressed with every change that was waiting on a pipeline to be created because I knew rework was coming.
- That pipeline is only effective if I’ve defined the acceptance criteria for the feature and can implement those as part of the change. They must be fully automated because I cannot allow delivery workarounds and meet my goal. Therefore, I cannot allow manual acceptance testing, and I cannot allow flaky or slow tests.
- I cannot assume that I can effectively test the entire system in a reasonable amount of time as its size increases. The math works against me here. The larger the thing I’m trying to test, the more paths need to be tested, the longer it takes, and the harder it is to control variables. So, I need an architecture that is easy to test and that gives me a high level of confidence that a change to one part of the system will not break another part. That means a modular architecture with a clear separation of concerns. That means I need to take domain-driven design seriously and use a “contract first” approach to API changes that evolve without breaking consumers.
- I need all of my APIs documented with machine-readable documentation so that I can easily create virtual services to remove the need for E2E testing of all components because E2E tests are slow and flaky, and slow and flaky tests will be bypassed at 3 am. (See 2)
- I need the code to be modular and easy to test. I use unit tests to drive modularity at that level. I also focus on writing pure functions and isolating side effects because pure functions are easier to reason about and easier to test.
Everything I do and talk about is optimizing for operations because if we do not optimize for that, we cannot respond effectively to security, functional, stability, or availability problems.
Work Smarter, Not Harder
Going back to the original comment that scoffed at the idea that developers would focus on something other than pumping out code, I had follow-up conversations that validated my assumptions about how incompetent the management was. They hired cheap, offshore coding mills to do development. Those developers, from my personal experience, are new to the industry and are trying to level up their resume to get a “real” job. They have no long-term relationship with the product they are building, no operational responsibility, and no incentives to do things correctly because the outcome is someone else’s problem. Is this their fault? No! It’s the executives’ fault who are “saving money” on $35/hr developers. If you work for people that dumb, stop being frustrated and stop blaming the developers. No, you can’t fix it by hiring more testers. You CAN fix it by quitting and working for someone competent.