A Deep Dive Into Seamless Blue/Green Deployment Using AWS CodeDeploy


Before we go deep into the trenches of how to use Blue/Green deployment, let's try to understand what it is:

Blue/Green deployment is a deployment pattern with the intention of deploying a new version of an application/software without any downtime or with minimal risk. Blue/Green deployment is achieved by bringing up a similar stack and then deploying the new version of the application on this new stack. Traffic is moved from the current stack (which is called the Blue stack) to the new stack (which is called the Green stack).

Now that we’ve covered what it is, why one should go for Blue/Green deployment?

  • No downtime: You are moving the traffic from the Blue stack to the Green stack.
  • Easy rollback: If the Green stack isn’t healthy, you can follow the reverse process and move the traffic back to the Blue stack.
  • Reduced risk: You can validate the Green stack by running functional tests before you migrate the prod live traffic.

In this blog, we are going to dive deep into how we can do a Blue/Green deployment using AWS CodeDeploy service for ECS container tasks.

When AWS launched ECS back in April 2015, there was no out-of-box support for Blue/Green deployment. Engineers tried various options to work around this, including:

  • Swapping auto-scaling groups behind ELB.
  • Updating the auto-scaling group launch configurations.
  • DNS routing update using Route53.

In August 2016, AWS launched Application Load Balancer (ALB) but was still not supporting ECS Blue/Green deployment. To work around this, my team at Capital One implemented our own homegrown solution for Blue/Green deployment using a beta ALB for test traffic and then swapping out the ALB listener rules for traffic cutover between Blue/Green tasks.

Fast forward to Nov 2018, Amazon ECS added official support for Blue/Green deployments using CodeDeploy.

 

Prerequisites for ECS Blue/Green Deployment

  • The ECS deployment type should be “blue-green”.
  • The ECS service must be using the Application Load Balancer or the Network Load Balancer. We will be using the ALBr in this blog.
  • The ALB should have a listener that will take prod traffic.
  • An optional test listener can be added to the load balancer, which is used to route test traffic. If you specify a test listener, CodeDeploy routes your test traffic to the replacement/Green task set during deployment.
  • Two ALB target groups should be created, one for the Blue tasks and another for the Green tasks.

 

CodeDeploy Blue/Green Deployment Flow

The assumption here is that the dev team team is already using deployment automation to programatically create an ECS service with ALB and target groups, or they should be doing the setup using AWS Console.

The below diagram shows initial deployment where only Blue tasks are running and taking 100% production traffic.

Diagram of initial blue stack using black and red logos and blue arrows

Initial Blue stack

Now, let’s look at a CodeDeploy based Blue/Green deployment. You’ll notice from the diagram below that Green tasks start (they are the new version of the code) and are attached to Target Group 2. The ALB test traffic listener is now ready for test traffic on port 8443, test traffic will be sent to Green tasks using Target Group 2. We can add a hook (a lambda function) once test traffic is ready through the test listener. The lambda function can perform some functional testing on the ALB/test listener port 8443 and will return either “succeeded” or “failed”.

Diagram of green traffic flowing through Test listener using black and red logos and blue and green arrows

Green traffic flowing through Test listener

Assuming the test traffic lambda hook returned “succeeded,” the production traffic is routed to Target Group 2, which is in turn served by Green tasks (new code version). The ALB prod listener port 443 and test listener port 8443 both now point to Target Group 2. CodeDeploy will keep the Blue tasks for a pre-configured period so that a rollback can be possible either from the CodeDeploy console or through CLI/API call.

Daigram of prod traffic flowing through Green tasks using black and red logos and blue and green arrows

Prod traffic flowing through Green tasks

Once the pre-configured period is elapsed, CodeDeploy will terminate the Blue tasks, and after this point, rollback won’t be possible.

Diagram of blue tasks terminated using black and red logos and blue and green arrows

Blue tasks terminated

Various CodeDeploy Resources

Let’s dive deep into the implementation details now. The below AWS CodeDeploy artifacts are required to support the ECS Blue/Green deployment:

  • CodeDeploy Application
  • CodeDeploy Deployment Group
  • CodeDeploy Deployment
Blue, red, and green flowchart with white text and blue arrows, detailing AWS CodeDEploy artifacts

So you can either create the above CodeDeploy resources from the AWS CodeDeploy console, or you can programmatically create them using CLI or language-specific APIs like Python/Boto3. Let’s walk through what that looks like:

Create a CodeDeploy Application Using Python/Boto3:

    cd_client = boto3.client('codedeploy')
response = cd_client.create_application(
   applicationName=application_name,
   computePlatform='ECS'
)
  

Create a CodeDeploy deployment group using Python/Boto3:

    response = cd_client.create_deployment_group(
   applicationName='AppECS-sample-springboot-app-qa-bg',
   deploymentGroupName='DgpECS-sample-springboot-app-qa-bg',
   deploymentConfigName='CodeDeployDefault.ECSAllAtOnce',
   serviceRoleArn='arn:aws:iam::123456789123:role/ecs_service_role',
   triggerConfigurations=[
       {
           'triggerName': 'sample-springboot-app-qa-code-deploy-bg-trigger',
           'triggerTargetArn': 'arn:aws:sns:us-east-1:123456789123:my_sns_topic',
           'triggerEvents': [
               "DeploymentStart",
               "DeploymentSuccess",
               "DeploymentFailure",
               "DeploymentStop",
               "DeploymentRollback",
               "DeploymentReady"
           ]
       },
   ],
   autoRollbackConfiguration={
       'enabled': True,
       'events': [
           'DEPLOYMENT_FAILURE', 'DEPLOYMENT_STOP_ON_ALARM', 'DEPLOYMENT_STOP_ON_REQUEST',
       ]
   },
   deploymentStyle={
       'deploymentType': 'BLUE_GREEN',
       'deploymentOption': 'WITH_TRAFFIC_CONTROL'
   },
   blueGreenDeploymentConfiguration={
       'terminateBlueInstancesOnDeploymentSuccess': {
           'action': 'TERMINATE',
           'terminationWaitTimeInMinutes': 15
       },
       'deploymentReadyOption': {
           'actionOnTimeout': 'CONTINUE_DEPLOYMENT'
       }
   },
   loadBalancerInfo={
       'targetGroupPairInfoList': [
           {
               'targetGroups': [
                   {
                       'name': 'sample-springboot-app-qa-tg1'
                   },
                   {
                       'name': 'sample-springboot-app-qa-tg2'
                   }
               ],
               'prodTrafficRoute': {
                   'listenerArns': 'arn:aws:elasticloadbalancing:us-east-1:123456789123:listener/app/sample-springboot-app-qa-alb/2b8b8ab60f9c7e43/97b643f12d4fa8a4'
               },
               'testTrafficRoute': {
                   'listenerArns': 'arn:aws:elasticloadbalancing:us-east-1:123456789123:listener/app/sample-springboot-app-qa-alb/2b8b8ed64f9c7e43/09261df9b5476d39'
               }
           },
       ]
   },
   ecsServices=[
       {
           'serviceName': 'sample-springboot-app-qa',
           'clusterName': 'my-test-cluster'
       }
   ]
)
  

Create a CodeDeploy Deployment:

    response = cd_client.create_deployment(
   applicationName='AppECS-sample-springboot-app-qa-bg',
   deploymentGroupName='DgpECS-sample-springboot-app-qa-bg',
   revision={
       'revisionType': 'AppSpecContent',
       'appSpecContent': {
           'content': "APPSPEC FILE CONTENT",
           'sha256': "xxxxxx"
       }
   },
   ignoreApplicationStopFailures=False,
   autoRollbackConfiguration={
       'enabled': True,
       'events': [
           'DEPLOYMENT_FAILURE',
           'DEPLOYMENT_STOP_ON_ALARM',
           'DEPLOYMENT_STOP_ON_REQUEST'
       ]
   }
)
  

The CodeDeploy “Deployment” uses an AppSpec file, which is a YAML file that provides the resource information and execution hooks information. The Resources section, as shown below, contains the Task Definition and container info. The Hooks section lets you add lambda functions to be triggered during various points in the lifecycle of the CodeDeploy Blue/Green deployment (more details in the below section).

    {
   "version": 0.0,
   "Resources": [
       {
           "TargetService": {
               "Type": "AWS::ECS::Service",
               "Properties": {
                   "TaskDefinition": "ECS-TASK-DEFINITION-ARN",
                   "LoadBalancerInfo": {
                       "ContainerName": "my-container",
                       "ContainerPort": 8080
                   }
               }
           }
       }
   ],
   "Hooks": [
 {
  "AfterAllowTestTraffic": "arn:aws:lambda:us-east-1:1234567890:function:my-green-ready-hook"
 }
]
}
  

Blue/Green Deployment in Action: The Happy Path

Initial Setup:

Here we have an ECS service using Blue/Green deployment (powered by CodeDeploy) running a Fargate task.

Note that the ECS service is running based off ECS task definition version 132 and is running one Fargate task:

Task screen with black text showing ECS- Before Blue/Green deployment

ECS - Before Blue/Green deployment

The ALB and the Target Groups attached to this ECS service:

ECS service screen with black text and blue highlighted buttons and rows, showing ALB before blue/green deployment

ALB — Before Blue/Green deployment

The listener with port 443 is the PROD traffic listener and the port 8443 is the TEST traffic listener. At this point, both the listeners are attached to the same target group ending with “-tg2”.

ECS service screen showing target groups before blue/green deployment, in black text and with blue buttons

Target Groups — Before Blue/Green deployment

The two target groups created for this ECS service to support Blue/Green deployment using Code Deploy. At this point, only the target group “-tg2” is running a target/task and is attached to the above ALB listener.

Let's Initiate the Blue Green Deployment

First, we start the Blue/Green deployment for the ECS service. The ECS deployment can be started programmatically using the ECS API or using the AWS ECS console - update ECS service feature, followed by a new deployment creation from the CodeDeploy console or CodeDeploy API. The updated code version will be mentioned in the new ECS task definition file that will be used to update the ECS service.

As the ECS service’s deployment controller is CodeDeploy, deployment in CodeDeploy gets triggered and a new ECS task is started here.

Initiate Deployment screen showing blue and grey status buttons and black text

CodeDeploy — Initiate Deployment

You can see below a new ECS task has started, the running count is now 2. Under the Deployments tab, the PRIMARY entry is for the current Blue task, the ACTIVE entry is for the new upcoming Green task.

ECS deployment in progress screen with black text and grey table rows

ECS — Blue/Green deployment in progress in CodeDeploy

Green tasks are now up and running. The PROD traffic is still being served by the Blue task in the original stack. Rollback is still possible at this stage, but keep in mind that rollback will just terminate the Green task.

Deploy screen with blue, grey, and green status bars and black text

Code Deploy — Post creation of Green task

Now the ALB listeners have been updated. PROD listener (Port-443) still attached with target group “-tg2” and servicing live traffic. The TEST listener (Port-8443) is attached with the target group “-tg1”, which was previously not at all attached to the ALB. So you can hit the ALB DNS:8443 and test the Green stack which is not taking any PROD traffic.

ECS screen with black text, blue button, and blue and grey highlighted table rows

ALB — Now listeners are connected to PROD and TEST target groups

Now, both “-tg1” (taking TEST traffic) and “-tg2” (taking PROD traffic) are attached to the ALB serving TEST and PROD listeners.

Target groups screen with black text and grey and blue highlighted table rows, and blue buttons

Target Groups — Both target groups now active and attached to ALB

PROD Traffic Moved to Green Task From Blue:

Traffic routing completes and the new Green task (or Replacement task) is now serving the PROD traffic. But how did traffic routing happen? Do we have any control over this? — Good questions and I will answer in a separate section :)

Rollback is still possible at this stage as the Blue task is still around and sitting idle, it will be available for the next 15 mins (this duration is configurable in the DeploymentGroup attribute “terminationWaitTimeInMinutes” as shown in the previous section) But how does the rollback work? — Again a good question that I will explain in another section.

Traffic rerouting status screen with blue, grey, and green status bars and black text

CodeDeploy — Traffic rerouting done

ECS is now running the task using the updated task definition version 133 (it was previously 132). Both the original and new tasks are still running, that's why rollback is still possible. The Deployments tab now shows that 100% of traffic is being served by Replacement task, which is the new Green task.

ECS Screen detailing Post traffic rerouting from blue to green, in black text

ECS — Post traffic rerouting from blue to green

Now, both the ALB listeners PROD (Port-443) and TEST (Port-8443) are pointing to the target group “-tg1” which has the Green task as target.

Post traffic rerouting with blue and grey table rows, black text, and blue buttons

ALB — Post traffic rerouting

Fast Forward 15 mins — Blue/Green Deployment Has Been Completed

Based on the CodeDeploy “DeploymentGroup” configuration “terminationWaitTimeInMinutes”, after 15 mins it will terminate the Blue task. Rollback is no longer possible now as the Blue task is gone!

Status screen with blue, grey, and green status bars and black text

CodeDeploy — Post termination of Blue/original task

The ECS running task count is back to 1 as the Blue/original task is gone. The Task definition version is “133” which means the new version is serving the PROD traffic. Under Deployments tab, the ACTIVE entry is now gone.

Status screen with black text and grey table, and red boxes outlining task definition, task ID, and run count

Blue/Green Deployment in Action: Rollback Flow

Let's Recap Our Current Stable PROD State:

  • ECS service is running 1 task using task definition version 133.

  • ALB listeners both PROD (Port-443) and TEST (Port-8443) pointing to target group “-tg1”.

  • Target group “-tg1” is active and serving PROD traffic and “-tg2” is not attached to the ALB.

We want to deploy a new version now and will rollback post traffic rerouting to demo the rollback flow:

Again, Let's Initiate the Blue/Green Deployment:

Let’s fast forward and review the state where the new Blue/Green deployment reroutes the traffic to Green stack and waits for 15 mins before terminating the Original/Blue task.

We have CodeDeploy Blue/Green deployment waiting to terminate the Blue task and PROD traffic being served by the new Green/Replacement task.

Deploy screen with blue, green, and grey status bars and black text

CodeDeploy — Post rerouting traffic

The ECS service now running a new task using the new ECS task definition version 134 (we started this deployment with 133) and our ECS running task count is 2 - one with ECS task def version 133 and another with new ECS task def version 134. ECS Deployments should be showing 100% traffic being served by the new PRIMARY task.

ECS screen with black text, grey tables, and red boxes outlining task definition, run count, and task ID

ECS Service - Deployment View

ALB listeners PROD (Port-443) and TEST (Port-8443) are both pointing to the target group “-tg2”.Previously they were attached to “-tg1”.

ECS screen with blue buttons, blue and grey highlighted table rows, and black text

ALB - Listeners

So, Let's Assume the Deployment Didn’t Go Well and We Have to Rollback Now…

Let's initiate the rollback from the CodeDeploy console by clicking the “Stop and rollback deployment” button:

Stop and rollback deployment screen with black text and orange button

Code Deploy - Stop and Rollback Message

The way the deployment works is CodeDeploy stops the current deployment and skips the step of deleting the Original/Blue task. It then creates a new deployment to rollback the previous deployment and reroutes the traffic back to Original/Blue task from the Replace/Green task. It also terminates the Replacement/Green task.

Status screen with grey, blue, and green status bars

Code Deploy - Rollback Deployment

Let’s Review How the ECS and the ALB Setup Looks Like Post Rollback

The ECS service is now running a task using the original task def version 133 and the running task count is back to 1. The ECS Deployments is showing 100% PROD traffic through the PRIMARY task.

ECS screen with black text, blue button, grey table, and red box outlining task definition, run count, and task ID

ECS Service - Post Rollback

The ALB listeners both PROD and TEST are back to the target group “-tg1”.

ECS screen with blue and grey table and blue buttons

ALB - Listeners Post Rollback

So this completes the successful rollback!

 

How to Control the Traffic Routing Between Blue/Green Tasks?

CodeDeploy allows you to run attach hooks to the Blue/Green deployment pipeline. The hooks are nothing but lambda functions that you implement.

A few example scenarios are:

  • Running functional test on the Green stack before routing PROD traffic.
  • Some environmental setup like downloading/uploading files to S3 before PROD traffic migration.

List of Lifecycle Event Hooks for an Amazon ECS Deployment (Ref: here)

  • BeforeInstall — Used to run tasks before the replacement task set is created.
  • AfterInstall — Used to run tasks after the replacement task set is created and one of the target groups is associated with it.
  • AfterAllowTestTraffic — Used to run tasks after the test listener serves traffic to the replacement task set. The results of a hook function at this point can trigger a rollback.
  • BeforeAllowTraffic — Used to run tasks after the second target group is associated with the replacement task set, but before traffic is shifted to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.
  • AfterAllowTraffic — Used to run tasks after the second target group serves traffic to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.

How can I add a lifecycle hook?

Lifecycle hooks lambda functions can be added through the AppSpec file that you create and attach to the CodeDeploy Deployment object, please refer to the section above for an example.

My implementation for the automatic rollback involves:

  • Adding a hook for the lifecycle event “AfterAllowTestTraffic”.
  • The lambda function runs functional tests on the Green task using the ALB DNS + TEST listener Port (8443).
  • If the functional test passes (i.e., returns “Succeeded” to the CodeDeploy console), the Blue/Green deployment will continue and traffic will shift to the Green stack.
  • If the test fails (i.e.,  returns “Failed” to the CodeDeploy console), it will initiate an automatic rollback of the traffic to the Blue stack.

 

TL;DR

ECS service deployment using AWS CodeDeploy is a very powerful combination that provides a very easy and robust Blue/Green deployment support.

The additional deployment lifecycle hooks give you the flexibility to control the traffic routing policy per your requirements.

If you are already extensively using AWS services, and ECS is your container deployment platform, you should definitely consider this a go-to architecture over homegrown solutions.

If you want to learn about Canary deployment patterns using existing AWS services, please take a look here, it’s an excellent post on this topic!

 

References


Avijit Sarkar
Avijit Sarkar, Lead Software Engineer

Competent and dynamic IT professional enriched with the latest trends and techniques and a wide range of skill in Project Management, Quality Initiatives, Technology, Critical Thinking, Troubleshooting, Problem Analysis and Resolution.


DISCLOSURE STATEMENT: © 2020 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

Related Content

Cloud

How Cloud and Ops Teams Can Take Cloud Adoption From Bottleneck to Engineering Success

Cloud

Just in Time Cloud Infrastructure

Machine Learning

A Developer Walks into AWS SageMaker...