When we talk about DevOps at Capital One, we don’t talk about the general definition of DevOps. What we talk about is the goal of DevOps — to deliver high quality working software faster. Instead of defining DevOps and asking what DevOps is, we focus on why DevOps is important to us. We break down “delivering high quality working software faster” and we focus on the words or phrases in this sentence that are important to us:
- High quality meaning no security flaws, in compliance, minimum defects, etc.
- Working meaning end to end it really works for all parties, that it’s been tested, and all dependencies are satisfied.
- Faster meaning as soon as possible without sacrificing quality.
Now if you look at the first two phrases, nothing has changed there with the advent of DevOps. Waterfall processes have been producing high quality, working software for years. What’s new is the last word — faster. Before DevOps, we used to do one release per quarter — now we do one per day or per week or per sprint.
Is that fast enough? How fast is faster? How do we measure it and where do we stop?
High Quality vs Faster
Industry wisdom says that the faster you go, the better you get. I started thinking about scientific proof for this concept and some places where I had seen it before. It turns out Daniel Bernoulli proved this in the 18th century, long before any of us were born. In his principle of fluid mechanics, if you constrict the continuous flow of a fluid, then you can actually increase the speed and lessen the pressure on it.
You can draw a parallel to this by saying if you have a pipeline from commit to deployment, then you can actually increase the flow (and make your developers feel less pressured) by making sure you are delivering smaller code chunks at a time. This is a basic principle of agile DevOps which you may be familiar with, even if you aren’t familiar with the fluid mechanics and physics behind it.
Three Types of Pipelines
Since we started our DevOps journey, I’ve seen many pipelines both inside and outside of Capital One. I can categorize problematic pipelines into three broad categories using some visual aids.
Type 1: This pipeline involves a bunch of parallel branches that go on forever. Where do they meet? They don’t. It’s an optical illusion, they meet at what mathematicians call infinity. This is a result of a poor branching strategy.
Type 2: This involves complicated pipelines of interdependent components put together like so. Each have their own code repositories and branches and you can’t always figure out where the pipeline starts and where it stops.
Type 3: This is a type of pipeline that requires an army to manage. This pipeline has holes that leak, test cases that fail, builds that fail, and someone has to fix all this themselves.
Creating Better Pipelines
So how do we design, measure, and improve our pipelines to avoid the above?
At Capital One, we design pipelines using the concept of the “16 Gates”. These are our guiding design principles and they are:
- Source code version control
- Optimum branching strategy
- Static analysis
- >80% code coverage
- Vulnerability scan
- Open source scan
- Artifact version control
- Auto provisioning
- Immutable servers
- Integration testing
- Performance testing
- Build deploy testing automated for every commit
- Automated rollback
- Automated change order
- Zero downtime release
- Feature toggle
These gates are used to understand each and every product’s progress through the DevOps process.
Pipeline measurement is another area that we started focusing on and are still researching. We focused on pinpointing where stoppages were happening and changed our focus from trying to speed things up, to identifying and reducing wait times as well. After all, you never know where the wait time is and which wait time is meaningful to you. I’ve seen developers work hard to reduce their build time from 25 to 15 minutes when their test cases ran for hours. In this case, should you spend a lot of energy trying to speed up the build time or reduce the time for testing? We let the developers choose which wait time to focus on by making things transparent via dashboards and reports.
Additionally, we found two areas of opportunity that could be improved on — our branching strategy and process.
Pipeline improvement is really process improvement. It’s largely focused on automating the release process and revisiting audit and compliance. Traditionally, the release management process in any big enterprise is a complicated process. The process stresses upon governance and risk mitigation. We’ve worked with our auditors and risk compliance office to understand how the process can be improved and fully automated. Collectively we have come to an agreement that:
- DevOps and CICD can provide better controls around risk and security mitigations than manual processes.
- A core set of practices and requirements can be fully automated in and around the pipeline to satisfy and further strengthen audit and compliance.
The final goal of DevOps is to create a mature, fully automated pipeline that goes from code commit all the way to production. We have created a model where you can continuously audit your pipeline by using the vast set of data points produced by the pipeline. We have already started open sourcing these under our Hygieia open source project.
This has allowed us to live up to the DevOps goal of delivering high quality working software faster without compromising or trading off on the quality or speed of our pipeline.