Software Quality Testing: Creating Quality Filtration Stacks

There are no silver bullets to managing software quality

Allison Perkel

September 23, 2020

Software quality. This is a term that’s talked about in the context of magic processes that promise to fix all the things, make your coffee in the morning, and tuck you into bed at night. I’ve been in software development for over 25 years. I’ve built and led teams building mission critical software products. Over the years, I’ve worked with amazing people who’ve shaped my view on how to think about software quality and how to build software with quality as a first class citizen. In this post, I’d like to pass on some of the knowledge I’ve gained and start a real discussion about what it means to care about software quality management.

Software Quality is Built Upon Layers

There is no silver bullet for software quality. As an industry, having different software tests and different software quality metrics at different levels improves the product. Visualizing software quality testing as layers of filters helped me and my teams design tests for every layer.

Think of a water filter: There are several substrates, each at different levels and each designed to filter out different particulates from the water. Managing software quality is like that. Each layer of the software quality stack is designed to filter out a different kind of issue. For me, this was a eureka moment. By breaking down testing strategy, we can focus on the areas that are most critical for us and, more importantly, for our customers. I can’t take credit for creating this view: my ideas were shaped by working closely with another software engineering leader, Todd Stadelhofer.

When we apply the filter methodology to software quality management we get the Quality Filtration Stack. This concept provides insights into where you have software quality test coverage, and with a simple traffic light color scheme, where you should focus on increasing your coverage.

diagram showing a quality filtration stack in red, yellow, and green

For the filtration stack to succeed, there are additional requirements for these layers. The most successful test systems have the following six things in common:

Easy to see the results.
Easy to get to the logs/debug data.
Low false positive/negative rate on test results.
Developers need to be in the mix and have ownership.
Continuous integration and deployment to automated testing and test environments.
Feedback cycle to continually incorporate results.

The more friction a developer experiences when working with test results, the less likely it is that the developer will use that particular software quality management framework. A strong DevOps culture promotes ownership, and this is one of the big reasons DevOps as a mindset is so successful. A reliable CI/CD process is critical to having a fully functional software quality management system.

There are base filters that I think every software development team should have. These filters are crucial when building out a complete framework that facilitates building quality software. Having these filters in place doesn’t guarantee bug-free software, but not having your quality stack does guarantee that customers will find issues with your application. Your filtration stack will look different than another team’s filtration stack, and that’s the point. What works for kernel developers will be different than what works for cloud or mobile engineering teams.

Software Quality Testing: Layers of the Stack

Software quality testing filtration layers, much like a water filter, are designed to catch different types of issues. The functional layer catches base requirements, the unit tests filter boundary issues, the system tests filter for interactions, and so on. Let’s take a closer look at what each layer is designed to filter for.

Code Reviews and Unit Testing

At the bottom of the stack, we have the ubiquitous unit tests and code reviews. Code reviews should be required for committing work. This step changes how developers think about the code they check in. Unit tests, for both the positive and negative conditions, are critical to understand how the software interactions change.

Here’s an example from personal experience. A long time ago, I used to code finite state machines as a “State Event Matrix.” There were many states where I assumed we’d never have an event. Based on the principles of unit testing, every unexpected state/event pair led to a function that printed out a stack trace, and enough debug information to go back and fix the issue. This was also where I put in my bad joke, “This error function can work in all States. Even Canada.” I think it’s clear I did not have code reviews.

Functional Testing

As we work our way up the stack, the next level is functional testing. These tests should be automated, grouped, and set up to run in different groups depending on being built on branches so we can verify the requirements of the software are met.

Verifying that software meets requirements is an age-old problem. Many years ago I worked on databases, specifically at the communication layer. For my functional tests, I needed to “hit physics speed” which meant I needed a means of defining how fast I could transmit data. By creating a test framework for communications, I was able to prove my communications code was fast and I could see, via a color coded test result, if my changes caused the communication speed to increase, decrease or remain the same.

Code Scanning

Code scanning tools: memory leaks, dead code, coverage paths, open source code use, and packages are all things that take time to view and can be done by scanning. This layer catches the bugs that get past the code reviews and functional tests: it can point you to areas of concern, potential deadlock interactions and code paths that the software should never go down. In today’s world, tools do a much better job than humans can.

A long time ago I joined a company and several of the engineers insisted they did not have memory leaks and that their code was perfect. I challenged them to prove it via automated testing. They built up the testing rig and surprise! We found a lot of memory issues. We also found a lot of dead code that was still under active maintenance. This insight allowed us to target critical issues we didn’t realize we had!

System Testing/Long Lived Testing

Systems are complex, production is complex, and where they interact is where they will break. Always. System tests typically lend themselves to ad hoc testing. In fact, when the main method of software testing is manual QA, system testing is the area that manual QA tests cover. To build truly world class software, you need testing at this layer. However, if you only test at this layer, you’ll cost yourself money because you’ll find issues far too late in the cycle and far too removed from the coding issue to resolve it quickly.

Long lived testing tends to get overlooked as teams build out MVPs, and it really shouldn’t be that way. Many years ago, I wrote some code to write to flash memory, but I messed up one of the loops. The flash part was rated to 50,000+ writes, but my bug cut the total number of writes down to around 5,000 before the part would no longer work. Because my mentor insisted we build out a testing rig, we caught the bug well before shipping and saved the company tens of millions of dollars. Don’t skip your long term testing!

How to Use the Quality Filtration Stack

If there is one thing that will take a software quality testing framework from a checkbox to truly being used and loved, it’s the ability to see and trust the results. If there is one thing you should take away from this post it’s this: whatever quality stack/framework you build, you need your developers to trust, use, and contribute to its growth.

The filtration stack is a great way to think about how much it costs to fix a software issue within your test infrastructure. The following chart gives a rough view into how catching bugs early in the stack costs much less than catching bugs later in the stack.This is another benefit of an operational software quality stack, you can visually represent the cost to fix an issue!

line graph with black axes and gridlines showing blue line segment increasing

What Makes for a Usable Stack?

This has profound implications to the way the stack conveys quality information. Test results should be grouped according to area and the results should be color coded (accounting for color blindness!) so that failures pop out. This allows a developer to quickly scan a Confluence page and click into a troubled area.

When someone clicks to go deeper, the error message should be right in front of the developer. It should be easy to see the log messages around the error and there should be links to additional logs, source code, etc. so that a person trying to diagnose the issue has most of the needed information to triage the bug at their fingertips.

Quality Engineering Mindset Across Everyone

As with many of the layers above, usability is necessary but not sufficient! You may find the need for a triage person. This person will need to look at results and categorize the issues as well as give feedback should there be too many incorrect results. Ideally, the triage role should rotate through the team, however some testing areas may prove too difficult to truly rotate everyone through.

There is a good chance you will have people dedicated to certain areas of the filtration stack. For example, you may have a dedicated performance engineer who goes through and interprets performance data. Or you may have a person dedicated to third party scanning tools, as many third party scanning tools produce false positives and it takes someone with tool knowledge to parse and understand the results. Automation is an amazing capability, but it’s not perfect.

In a previous life, we had automation engineers build out a framework without involvement from development. The goal was always to have developers take ownership and add tests to the framework. This model has never worked for me. Automation engineers typically end up owning everything test-related including triage (and they won’t be happy). What has worked is developers building out the framework and taking ownership at each layer. Buy-in to the process and the work is the best way to build software. I’ll have more to say on software process in a later post.

CI/CD is required for a functional stack. For the filtration layers to really hum and run seamlessly, CI/CD pipelines are a necessary first step. I’ve had great success in the past having teams build out their CI/CD pipeline first, and adopt a test-driven development model that easily mapped to every layer of the filtration stack. When working with a kernel development group, the team decided to build the CI/CD pipeline first. This incurred a time cost to the initial delivery, but all their other work was sped up because one of the most critical parts of the filtration stack (and software delivery in general) was built first. To this day, those kernel teams ship one of the most stable products on the market.

Why This All Matters

This gets even bigger when you think about how quickly software development lifecycles move and how rapidly we expect software vendors to fix issues and deliver new features. The world continues to move fast and software hasn’t been able to fully keep up. Building to the MVP helped software applications keep pace for a little while, but now we are seeing that process fray. Before we go too deeply into how the MVP process needs to reinvent itself, we’ll need to talk about software lifecycle development. We’ll save that discussion for the next post.

Implementing the software quality layers of the stack does not guarantee you’ll have perfect software. It does, however, increase the probability of crafting better software. Not creating the quality layers does mean you will release code that is below quality standards.

For those in development building out your MVP, how many times do you leave the automated tests to the end? How many times have you left the scale tests till the end because you needed something that could ship? At one startup, there was a push to get the database out the door and not worry about any type of scale or performance testing until the MVP was up. At this company, my role was distributed processing and synchronization. Thankfully, I did write some tests around scaling before I started developing. This allowed me to pressure test my communication algorithms, which prevented me from making several terrible mistakes in the initial design that, if left in, would have led to years of delays. Another person, who neglected database performance, discovered all the queries took too long. This actually led to huge shipping delays and even after the product shipped, there were no good tests to pinpoint where long running queries got stuck. Needless to say the customers were not happy, and that company no longer exists.

Without a well thought out quality stack, you’ll find your product is harder to maintain, update, and fix. You’ll also find that you have a lot fewer customers. To badly paraphrase Tolstoy, Software built with quality is all alike; software built without quality, uniquely fails in its own way.

Allison Perkel, VP, Software Engineering

Allison Perkel is VP, Software Engineering at Capital One, where she leads an emerging line of business whose mission is to bring the amazing software built at Capital One to market. When not making the world a better place through technology, she can be found with her camera documenting the world around her. You may also find her cheering for the Yankees.