Enterprise release checklist for applications (illustrated)
Sleep better, have fun, & create value by better managing your pre-releases, releases, & monitoring
Words by Noah Mandelbaum, Distinguished Engineer
Illustrations by Mike Damrath, Senior Manager
It takes a large amount of effort to develop applications.
We stand up and then we sit down. We work for weeks to write clean code and to create code coverage. We think hard about how we can make our applications maintainable (using the ideas that Santhi Sridharan laid out in her article on developing maintainable software). Sometimes, we refactor until our vision becomes slightly blurry.
And few things are as frustrating as finding out on release that the application that you lovingly labored to create is flawed or broken in some way.
This broken software can create a negative customer experience, can cause business risk, and can generate a large amount of additional work for teams. So how can we help ensure our releases are more successful?
This checklist tries to strike the balance between having no formal process and the comprehensiveness of Release It! Answering the questions on this checklist can help your team achieve successful releases - plus, you might even smile while reading it or looking at our hand-drawn sketches (even if you hate off hours releases).
Note: We are two software engineers working on systems that are keenly important to both customers and employees at Capital One. We developed this release checklist to help our teams be better prepared for their releases. We know this is not perfect for all use cases and that you might need to modify it for your team - but we hope it will help you as well.
The illustrated enterprise application release checklist
In 1735, Benjamin Franklin coined the phrase “an ounce of prevention is worth a pound of cure” in explaining to the citizens of Philadelphia the best way to prevent house fires.
In order to help prevent your release from becoming its own house (or dumpster) fire, we have divided up this checklist into three sections:
We try to follow two general rules as we prepare business intent for a release:
- Limit the number of items that are in a release to limit risk: and plan to release frequently.
- Make sure we have completed our definition of done for all work:
- Written unit tests with proper code coverage.
- Executed automated functional tests to validate new features and to make sure old features don’t regress.
- Completed usability/accessibility testing.
- Carried out performance testing (this is particularly important).
- Scanned our code for security vulnerabilities - in particular, those that are found on the OWASP Top Ten list.
- Documented any technical debt we ran up.
Beyond this, when getting ready for a release, there are number of technical items we look at:
- Is the code we wish to deploy checked into version control: and tagged appropriately?
- Did we update our production configuration correctly?
- If our code has dependencies (like microservices, databases, etc) - are those dependencies deployed in production and ready to take traffic?
- Have we made sure the infrastructure we need is provisioned in production with the right network configuration - including DNS, load balancers and (in our case) other AWS resources?
- Have we tested what happens to our application when it experiences heavy load - does it scale or does it collapse?
- Have we tested what happens to our user experience when we deploy a new version in the middle of the day - do our users experience any errors?
- Have we set up any feature toggles we require for canary or A/B testing?
- Have we accounted for access control for any application that requires it?
- Have we verified the observability information we need to debug in production will be available - are we able to capture and analyze metrics, traces, and logs (note - there are plenty of commercial and open source tools that can help you in this space)?
- Is there a rollback plan in place? Can the feature be toggled off if needed? Can we revert to the previous code version if needed? Can the users work around problems if the release goes sideways?
We try to always make sure we have a Plan B and Plan C in place!
When getting ready for a release - people turn out to be important, too:
- Do we have a list of key technical contacts we can reach quickly in case our release doesn’t work out as we hoped?
- Do we know what impact our changes will have on which users?
- Have we tested that our enabled feature toggles include the right people and exclude the right people?
- Have we tested that access control includes the right people and excludes the right people?
- Have we lined up people to help verify our release? People like:
- Product manager(s).
- Intent owners.
- Support engineers.
- People who should see our changes: and people who should not see them.
- Have we completed other release preparations that might make our rollout a success? For example:
- Updated our training and documentation?
- Made public announcements about the upcoming release?
- Sent communications to people who might be interested in the changes?
A lot of big enterprises require a more formal change control process. We happen to work in one of these places, so sometimes we might have to create a change request, including:
- Making sure our request to make the change was entered into the change control system.
- Making sure our change request does not conflict with other critical items in the change control system like no-release periods.
Having survived countless manual deployments earlier in our lives, nothing beats a good continuous integration/continuous delivery (CI/CD) pipeline. If you don’t have one, we would recommend getting one: otherwise, you may be stuck with a manual playbook and the huge headaches that come with it.
We are fortunate enough to have a CI/CD pipeline - so, while a release is occurring, we:
- Verify that our deployment job has successfully completed by looking at our deployment logs.
- Check that all code changes (and associated dependency changes) occurred in the expected manner.
- Confirm that all activity has drained from the old code completely and the new code is now handling your traffic.
- Analyze metrics and logs (and maybe traces) to make sure no outages or errors occurred during the deployment.
Just as important for us is the user experience - so, we ask ourselves the following questions:
- Have the people we have lined up for verification provided confirmation that they can see what they should see? Likewise, have people verified they cannot see what they should not see?
- Have the people who verified our system noted any other anomalies - an error screen, unusual latency, an unexpected navigation flow? Sometimes, this provides an almost imperceptible signal that something has gone wrong.
In the unfortunate event our deployment is unsuccessful , we then execute our rollback plan.
Monitoring (not a) checklist
There are times in which a release initially appears healthy, but then encounters problems later - this may especially be the case if it is associated with a database or a REST API change.
We have found that if we set up our observability well, we can set alarms that will notify us proactively if errors or other anomalies occur. These alarms alert us if we see unexpected behavior from our infrastructure, our code, and around critical transactions for our systems. For releases that require more granular verification, we may also scan logs for unexpected behavior.
That being said, we also pay particular attention to the first 24 hours after a release - especially if our release took place during a low traffic time.
If this was the case, sometimes we ask the appropriate users to reverify the release functionality and provide verbal/written confirmation of success.
While this release checklist is not necessarily comprehensive for all teams and use cases, hopefully it has entertained you. And hopefully, it will help you in getting your work deployed to your users.