Explore Jan 28, 2020

A Deep Dive Into Seamless Blue/Green Deployment Using AWS CodeDeploy

Before we go deep into the trenches of how to use Blue/Green deployment, let's try to understand what it is:

Blue/Green deployment is a deployment pattern with the intention of deploying a new version of an application/software without any downtime or with minimal risk. Blue/Green deployment is achieved by bringing up a similar stack and then deploying the new version of the application on this new stack. Traffic is moved from the current stack (which is called the Blue stack) to the new stack (which is called the Green stack).

Now that we’ve covered what it is, why one should go for Blue/Green deployment?

  • No downtime: You are moving the traffic from the Blue stack to the Green stack.
  • Easy rollback: If the Green stack isn’t healthy, you can follow the reverse process and move the traffic back to the Blue stack.
  • Reduced risk: You can validate the Green stack by running functional tests before you migrate the prod live traffic.

In this blog, we are going to dive deep into how we can do a Blue/Green deployment using AWS CodeDeploy service for ECS container tasks.

When AWS launched ECS back in April 2015, there was no out-of-box support for Blue/Green deployment. Engineers tried various options to work around this, including:

  • Swapping auto-scaling groups behind ELB.
  • Updating the auto-scaling group launch configurations.
  • DNS routing update using Route53.

In August 2016, AWS launched Application Load Balancer (ALB) but was still not supporting ECS Blue/Green deployment. To work around this, my team at Capital One implemented our own homegrown solution for Blue/Green deployment using a beta ALB for test traffic and then swapping out the ALB listener rules for traffic cutover between Blue/Green tasks.

Fast forward to Nov 2018, Amazon ECS added official support for Blue/Green deployments using CodeDeploy.

 

Prerequisites for ECS Blue/Green Deployment

  • The ECS deployment type should be “blue-green”.
  • The ECS service must be using the Application Load Balancer or the Network Load Balancer. We will be using the ALBr in this blog.
  • The ALB should have a listener that will take prod traffic.
  • An optional test listener can be added to the load balancer, which is used to route test traffic. If you specify a test listener, CodeDeploy routes your test traffic to the replacement/Green task set during deployment.
  • Two ALB target groups should be created, one for the Blue tasks and another for the Green tasks.

 

CodeDeploy Blue/Green Deployment Flow

The assumption here is that the dev team team is already using deployment automation to programatically create an ECS service with ALB and target groups, or they should be doing the setup using AWS Console.

The below diagram shows initial deployment where only Blue tasks are running and taking 100% production traffic.

Initial Blue stack

Now, let’s look at a CodeDeploy based Blue/Green deployment. You’ll notice from the diagram below that Green tasks start (they are the new version of the code) and are attached to Target Group 2. The ALB test traffic listener is now ready for test traffic on port 8443, test traffic will be sent to Green tasks using Target Group 2. We can add a hook (a lambda function) once test traffic is ready through the test listener. The lambda function can perform some functional testing on the ALB/test listener port 8443 and will return either “succeeded” or “failed”.

Green traffic flowing through Test listener

Assuming the test traffic lambda hook returned “succeeded,” the production traffic is routed to Target Group 2, which is in turn served by Green tasks (new code version). The ALB prod listener port 443 and test listener port 8443 both now point to Target Group 2. CodeDeploy will keep the Blue tasks for a pre-configured period so that a rollback can be possible either from the CodeDeploy console or through CLI/API call.

Prod traffic flowing through Green tasks

Once the pre-configured period is elapsed, CodeDeploy will terminate the Blue tasks, and after this point, rollback won’t be possible.

Blue tasks terminated

Various CodeDeploy Resources

Let’s dive deep into the implementation details now. The below AWS CodeDeploy artifacts are required to support the ECS Blue/Green deployment:

  • CodeDeploy Application
  • CodeDeploy Deployment Group
  • CodeDeploy Deployment

So you can either create the above CodeDeploy resources from the AWS CodeDeploy console, or you can programmatically create them using CLI or language-specific APIs like Python/Boto3. Let’s walk through what that looks like:

Create a CodeDeploy Application Using Python/Boto3:

cd_client = boto3.client('codedeploy')
response = cd_client.create_application(
   applicationName=application_name,
   computePlatform='ECS'
)

Create a CodeDeploy deployment group using Python/Boto3:

response = cd_client.create_deployment_group(
   applicationName='AppECS-sample-springboot-app-qa-bg',
   deploymentGroupName='DgpECS-sample-springboot-app-qa-bg',
   deploymentConfigName='CodeDeployDefault.ECSAllAtOnce',
   serviceRoleArn='arn:aws:iam::123456789123:role/ecs_service_role',
   triggerConfigurations=[
       {
           'triggerName': 'sample-springboot-app-qa-code-deploy-bg-trigger',
           'triggerTargetArn': 'arn:aws:sns:us-east-1:123456789123:my_sns_topic',
           'triggerEvents': [
               "DeploymentStart",
               "DeploymentSuccess",
               "DeploymentFailure",
               "DeploymentStop",
               "DeploymentRollback",
               "DeploymentReady"
           ]
       },
   ],
   autoRollbackConfiguration={
       'enabled': True,
       'events': [
           'DEPLOYMENT_FAILURE', 'DEPLOYMENT_STOP_ON_ALARM', 'DEPLOYMENT_STOP_ON_REQUEST',
       ]
   },
   deploymentStyle={
       'deploymentType': 'BLUE_GREEN',
       'deploymentOption': 'WITH_TRAFFIC_CONTROL'
   },
   blueGreenDeploymentConfiguration={
       'terminateBlueInstancesOnDeploymentSuccess': {
           'action': 'TERMINATE',
           'terminationWaitTimeInMinutes': 15
       },
       'deploymentReadyOption': {
           'actionOnTimeout': 'CONTINUE_DEPLOYMENT'
       }
   },
   loadBalancerInfo={
       'targetGroupPairInfoList': [
           {
               'targetGroups': [
                   {
                       'name': 'sample-springboot-app-qa-tg1'
                   },
                   {
                       'name': 'sample-springboot-app-qa-tg2'
                   }
               ],
               'prodTrafficRoute': {
                   'listenerArns': 'arn:aws:elasticloadbalancing:us-east-1:123456789123:listener/app/sample-springboot-app-qa-alb/2b8b8ab60f9c7e43/97b643f12d4fa8a4'
               },
               'testTrafficRoute': {
                   'listenerArns': 'arn:aws:elasticloadbalancing:us-east-1:123456789123:listener/app/sample-springboot-app-qa-alb/2b8b8ed64f9c7e43/09261df9b5476d39'
               }
           },
       ]
   },
   ecsServices=[
       {
           'serviceName': 'sample-springboot-app-qa',
           'clusterName': 'my-test-cluster'
       }
   ]
)

Create a CodeDeploy Deployment:

response = cd_client.create_deployment(
   applicationName='AppECS-sample-springboot-app-qa-bg',
   deploymentGroupName='DgpECS-sample-springboot-app-qa-bg',
   revision={
       'revisionType': 'AppSpecContent',
       'appSpecContent': {
           'content': "APPSPEC FILE CONTENT",
           'sha256': "xxxxxx"
       }
   },
   ignoreApplicationStopFailures=False,
   autoRollbackConfiguration={
       'enabled': True,
       'events': [
           'DEPLOYMENT_FAILURE',
           'DEPLOYMENT_STOP_ON_ALARM',
           'DEPLOYMENT_STOP_ON_REQUEST'
       ]
   }
)

The CodeDeploy “Deployment” uses an AppSpec file, which is a YAML file that provides the resource information and execution hooks information. The Resources section, as shown below, contains the Task Definition and container info. The Hooks section lets you add lambda functions to be triggered during various points in the lifecycle of the CodeDeploy Blue/Green deployment (more details in the below section).

{
   "version": 0.0,
   "Resources": [
       {
           "TargetService": {
               "Type": "AWS::ECS::Service",
               "Properties": {
                   "TaskDefinition": "ECS-TASK-DEFINITION-ARN",
                   "LoadBalancerInfo": {
                       "ContainerName": "my-container",
                       "ContainerPort": 8080
                   }
               }
           }
       }
   ],
   "Hooks": [
 {
  "AfterAllowTestTraffic": "arn:aws:lambda:us-east-1:1234567890:function:my-green-ready-hook"
 }
]
}

Blue/Green Deployment in Action: The Happy Path

Initial Setup:

Here we have an ECS service using Blue/Green deployment (powered by CodeDeploy) running a Fargate task.

Note that the ECS service is running based off ECS task definition version 132 and is running one Fargate task:

ECS - Before Blue/Green deployment

The ALB and the Target Groups attached to this ECS service:

ALB — Before Blue/Green deployment

The listener with port 443 is the PROD traffic listener and the port 8443 is the TEST traffic listener. At this point, both the listeners are attached to the same target group ending with “-tg2”.

Target Groups — Before Blue/Green deployment

The two target groups created for this ECS service to support Blue/Green deployment using Code Deploy. At this point, only the target group “-tg2” is running a target/task and is attached to the above ALB listener.

Let's Initiate the Blue Green Deployment

First, we start the Blue/Green deployment for the ECS service. The ECS deployment can be started programmatically using the ECS API or using the AWS ECS console - update ECS service feature, followed by a new deployment creation from the CodeDeploy console or CodeDeploy API. The updated code version will be mentioned in the new ECS task definition file that will be used to update the ECS service.

As the ECS service’s deployment controller is CodeDeploy, deployment in CodeDeploy gets triggered and a new ECS task is started here.

CodeDeploy — Initiate Deployment

You can see below a new ECS task has started, the running count is now 2. Under the Deployments tab, the PRIMARY entry is for the current Blue task, the ACTIVE entry is for the new upcoming Green task.

ECS — Blue/Green deployment in progress in CodeDeploy

Green tasks are now up and running. The PROD traffic is still being served by the Blue task in the original stack. Rollback is still possible at this stage, but keep in mind that rollback will just terminate the Green task.

Code Deploy — Post creation of Green task

Now the ALB listeners have been updated. PROD listener (Port-443) still attached with target group “-tg2” and servicing live traffic. The TEST listener (Port-8443) is attached with the target group “-tg1”, which was previously not at all attached to the ALB. So you can hit the ALB DNS:8443 and test the Green stack which is not taking any PROD traffic.

ALB — Now listeners are connected to PROD and TEST target groups

Now, both “-tg1” (taking TEST traffic) and “-tg2” (taking PROD traffic) are attached to the ALB serving TEST and PROD listeners.

Target Groups — Both target groups now active and attached to ALB

PROD Traffic Moved to Green Task From Blue:

Traffic routing completes and the new Green task (or Replacement task) is now serving the PROD traffic. But how did traffic routing happen? Do we have any control over this? — Good questions and I will answer in a separate section :)

Rollback is still possible at this stage as the Blue task is still around and sitting idle, it will be available for the next 15 mins (this duration is configurable in the DeploymentGroup attribute “terminationWaitTimeInMinutes” as shown in the previous section) But how does the rollback work? — Again a good question that I will explain in another section.

CodeDeploy — Traffic rerouting done

ECS is now running the task using the updated task definition version 133 (it was previously 132). Both the original and new tasks are still running, that's why rollback is still possible. The Deployments tab now shows that 100% of traffic is being served by Replacement task, which is the new Green task.

ECS — Post traffic rerouting from blue to green

Now, both the ALB listeners PROD (Port-443) and TEST (Port-8443) are pointing to the target group “-tg1” which has the Green task as target.

ALB — Post traffic rerouting

Fast Forward 15 mins — Blue/Green Deployment Has Been Completed

Based on the CodeDeploy “DeploymentGroup” configuration “terminationWaitTimeInMinutes”, after 15 mins it will terminate the Blue task. Rollback is no longer possible now as the Blue task is gone!

CodeDeploy — Post termination of Blue/original task

The ECS running task count is back to 1 as the Blue/original task is gone. The Task definition version is “133” which means the new version is serving the PROD traffic. Under Deployments tab, the ACTIVE entry is now gone.

Blue/Green Deployment in Action: Rollback Flow

Let's Recap Our Current Stable PROD State:

  • ECS service is running 1 task using task definition version 133.

  • ALB listeners both PROD (Port-443) and TEST (Port-8443) pointing to target group “-tg1”.

  • Target group “-tg1” is active and serving PROD traffic and “-tg2” is not attached to the ALB.

We want to deploy a new version now and will rollback post traffic rerouting to demo the rollback flow:

Again, Let's Initiate the Blue/Green Deployment:

Let’s fast forward and review the state where the new Blue/Green deployment reroutes the traffic to Green stack and waits for 15 mins before terminating the Original/Blue task.

We have CodeDeploy Blue/Green deployment waiting to terminate the Blue task and PROD traffic being served by the new Green/Replacement task.

CodeDeploy — Post rerouting traffic

The ECS service now running a new task using the new ECS task definition version 134 (we started this deployment with 133) and our ECS running task count is 2 - one with ECS task def version 133 and another with new ECS task def version 134. ECS Deployments should be showing 100% traffic being served by the new PRIMARY task.

ECS Service - Deployment View

ALB listeners PROD (Port-443) and TEST (Port-8443) are both pointing to the target group “-tg2”.Previously they were attached to “-tg1”.

ALB - Listeners

So, Let's Assume the Deployment Didn’t Go Well and We Have to Rollback Now…

Let's initiate the rollback from the CodeDeploy console by clicking the “Stop and rollback deployment” button:

Code Deploy - Stop and Rollback Message

The way the deployment works is CodeDeploy stops the current deployment and skips the step of deleting the Original/Blue task. It then creates a new deployment to rollback the previous deployment and reroutes the traffic back to Original/Blue task from the Replace/Green task. It also terminates the Replacement/Green task.

Code Deploy - Rollback Deployment

Let’s Review How the ECS and the ALB Setup Looks Like Post Rollback

The ECS service is now running a task using the original task def version 133 and the running task count is back to 1. The ECS Deployments is showing 100% PROD traffic through the PRIMARY task.

ECS Service - Post Rollback

The ALB listeners both PROD and TEST are back to the target group “-tg1”.

ALB - Listeners Post Rollback

So this completes the successful rollback!

 

How to Control the Traffic Routing Between Blue/Green Tasks?

CodeDeploy allows you to run attach hooks to the Blue/Green deployment pipeline. The hooks are nothing but lambda functions that you implement.

A few example scenarios are:

  • Running functional test on the Green stack before routing PROD traffic.
  • Some environmental setup like downloading/uploading files to S3 before PROD traffic migration.

List of Lifecycle Event Hooks for an Amazon ECS Deployment (Ref: here)

  • BeforeInstall — Used to run tasks before the replacement task set is created.
  • AfterInstall — Used to run tasks after the replacement task set is created and one of the target groups is associated with it.
  • AfterAllowTestTraffic — Used to run tasks after the test listener serves traffic to the replacement task set. The results of a hook function at this point can trigger a rollback.
  • BeforeAllowTraffic — Used to run tasks after the second target group is associated with the replacement task set, but before traffic is shifted to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.
  • AfterAllowTraffic — Used to run tasks after the second target group serves traffic to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.

How can I add a lifecycle hook?

Lifecycle hooks lambda functions can be added through the AppSpec file that you create and attach to the CodeDeploy Deployment object, please refer to the section above for an example.

My implementation for the automatic rollback involves:

  • Adding a hook for the lifecycle event “AfterAllowTestTraffic”.
  • The lambda function runs functional tests on the Green task using the ALB DNS + TEST listener Port (8443).
  • If the functional test passes (i.e., returns “Succeeded” to the CodeDeploy console), the Blue/Green deployment will continue and traffic will shift to the Green stack.
  • If the test fails (i.e.,  returns “Failed” to the CodeDeploy console), it will initiate an automatic rollback of the traffic to the Blue stack.

 

TL;DR

ECS service deployment using AWS CodeDeploy is a very powerful combination that provides a very easy and robust Blue/Green deployment support.

The additional deployment lifecycle hooks give you the flexibility to control the traffic routing policy per your requirements.

If you are already extensively using AWS services, and ECS is your container deployment platform, you should definitely consider this a go-to architecture over homegrown solutions.

If you want to learn about Canary deployment patterns using existing AWS services, please take a look here, it’s an excellent post on this topic!

 

Avijit Sarkar
Lead Software Engineer

DISCLOSURE STATEMENT: © 2020 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.