What is cloud native? 5 principles of cloud native software
To fully leverage cloud native architecture, start by understanding what being cloud native really means.
May 25, 2022
Once upon a time, a technology was introduced that promised to radically change how businesses worked. At first, it was only used by people who bought into the vision and were willing to accept outages and complications to be part of the future. Despite wild promises of productivity increases from these early adopters, most people were hesitant and continued to do things the way they always had. But slowly, as the technology became increasingly reliable, the balance shifted. The costs of the old way of business were growing, it was harder to find skilled workers to support older technologies, and the companies that adopted the newer technology were more nimble and able to continue to innovate.
This sounds like the history of cloud computing and the adoption of cloud native software, but it's actually from over a century earlier. As Tim Hartford explained in "50 Things That Made the Modern Economy", it took over 50 years for companies to start to build their businesses around electricity provided by a central power provider instead of installing their own steam-based equipment. Just like having your own data center, running your own steam plant was expensive, but seemed more reliable when electricity was new. However, as electricity technology improved, it was no longer cost-effective to rely on steam and the workers needed to maintain that infrastructure were harder to find. This should sound familiar to anyone trying to hire skilled IT professionals.
Just as it's hard to justify building your own power plant, it's hard to compete with the scale of the major cloud providers. Relying on their services means that your operations people can focus on the problems that are unique to your company. There is no competitive advantage in racking individual servers, swapping hard drives, or checking for broken Ethernet cables.
After 100 years, no one talks about being "electric native". Using electricity is simply how work is done. And 15 years after the release of the first AWS service, most companies have either started their cloud migration or are seriously considering how to begin.
For many companies, the move to the cloud starts with a replication of their hosted environment. This is often referred to as a "lift and shift" approach, where the software on a couple of servers is copied into machine images and deployed as virtual servers in an account using AWS' EC2, Azure's Virtual Machines, or Google’s Compute Engine. Just as a data center's servers are firewalled from the outside world, network rules are set up to make sure that only the customer-facing virtual servers are externally accessible. When it's time to update your software, you take some downtime as the software on your virtual server is shut down, updated, and restarted.
If your cloud experience stops here, you'll wonder what the fuss is about. There might be some cost savings, and it's certainly faster to spin up an EC2 instance than to order and install a new physical server, but the actual day-to-day work for your developers and operations teams doesn't change terribly much. This is because you haven't adapted your tools and processes to take advantage of what the cloud offers. In short, your software isn’t cloud native yet.
What is cloud native?
But, what is "cloud native" really? What does it mean? The Cloud Native Computing Foundation, or CNCF, was created as an offshoot of the Linux Foundation to "make cloud native computing ubiquitous". They shepherd many of the projects that enable cross-platform cloud native software and have crafted a definition of what it means to be cloud native:
Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.
Let's go through the most important words there, take a look at what they mean, and then walk through the tools from cloud native platforms that allow us to build cloud native systems.
Five cloud native principles
There are five key principles that drive the design of cloud native software. Understanding them is critical to building your software with a cloud native architecture:
We'll start with scalable. Scalability is one of the primary motivators for moving to the cloud. The biggest drawback to running your own data center is that it takes a very long time to acquire and set up new hardware. This means that you need to reserve servers based on a guess of how much capacity you will need on your busiest day. If your company has a busy season, you have expensive excess capacity most of the year. Even worse, if you underestimate demand, your site will no longer function just when you need it the most.
Making your services scalable will likely require your developers to make changes to their software. This can be tricky, because it often means rethinking how your applications are architected. However, the payoff is worth it.
The first step is breaking a monolithic application into microservices. While you can run a monolith in the cloud, you can't increase resources for a single part of a monolith; it's all or nothing. With microservices, you can scale different functional areas of your application at different rates, depending on what they need.
Read more about microservices in this post: 7 microservices benefits and how they impact development
Once you are thinking in terms of microservices, the next step is to think about placing those microservices into containers. Docker popularized the idea of packaging software into immutable bundles and running them in isolation, without requiring a full operating system per service. This difference between containers vs. virtual machines allows you to run many more containers on the same underlying hardware than you could with VMs.
What makes a microservice cloud native?
There’s nothing about microservices or containers that prevent you from deploying them in a data center. The problem is that you still need to allocate and manage the underlying servers that host them. A cloud native microservice takes advantage of the services of a cloud provider. Just as running in the cloud means you no longer have to worry about the state of your server’s network cards and fans, a cloud-native architecture means that you can avoid worrying about allocating virtual servers.
Cloud native microservices follow certain design principles. The most important is that they are designed as stateless immutable infrastructure. This means two things:
- The container hosting your microservice doesn’t store any data.
- Once you launch a container, you do not modify it.
This leads to the question: how do you make updates? The answer is that in a cloud native architecture, whenever you want to change a cloud native microservice, you launch a new instance with the updates and turn off the old instance. This is in contrast to the older approach of making updates in place on a single server. The practice is somewhat heartlessly referred to as the difference between pets and livestock.
Crafting cloud native architecture
Once you’ve created these immutable instances, you are well on your way to building a cloud native architecture. Your teams can now take advantage of services from cloud providers to increase the scalability of your systems. When no one server instance is special, you can use services from the cloud provider to auto-scale, or configure your environment to automatically add and remove virtual server instances as the load on your system changes.
Breaking your services down into smaller components can lead to more scalability benefits. In some cases, you can reduce a cloud native microservice down further, to just a single function. AWS calls these Lambdas, Google calls them Cloud Functions, and Azure simply calls them Functions. These are even simpler to package than a container, often just a zip file containing some code. Your operations team only needs to configure the maximum number of functions to run simultaneously and how much memory to give each one. The cloud provider takes care of allocating the underlying machines, scaling them up and down (and even off) automatically. For infrequent processes or services that have bursts of requests, these functions are often much more cost effective than a container that runs all of the time.
Scaling functionality with cloud native architecture
The advantages of cloud native architectures extend far beyond the ability to scale your application’s business logic. When you have stateless, immutable infrastructure, your data still has to live somewhere. While you could run third-party databases on virtual servers, a cloud native architecture uses databases hosted by the cloud providers themselves. MySQL, Postgres, and Oracle are available from all three of the largest cloud providers, but only Azure has a hosted version of Microsoft SQL Server. Since hosted databases are managed by your cloud provider, it's easy to allocate additional resources as needed, like disk, memory, and CPU, scaling over time as your needs change.
You can also start looking at other non-relational ways to store your data. One of AWS' earliest services was S3, the Simple Storage Service. It lets you place files into "buckets" (Azure calls their version of this service Blob Storage and Google calls it Cloud Storage). There are document databases, noSQL databases, graph databases, data warehouses, and even private blockchains. Having these alternate data stores available with only an API is powerful. Your teams are able to find out if there are better solutions to their problems, far faster and cheaper than they could do in a self-managed environment.
As your teams get more comfortable, they will find themselves exploring more ways to focus on your company’s core competencies. For example, consider customer identity. Rather than managing this information yourself, cloud providers (as well as third-party companies) have identity management solutions using standards like OAuth2 and OIDC. Similar solutions exist for other enabling technologies, like machine learning or batch processing. Cloud native architectures not only scale your software, they also scale your development team’s capabilities by letting you focus on what you do best.
Another key part of a cloud-native architecture is that it is resilient. What does this mean? As Matthew Titmus explains in "Cloud Native Go":
Resilience (roughly synonymous with fault tolerance) is a measure of how well a system withstands and recovers from errors and faults. A system can be considered resilient if it can continue operating correctly—possibly at a reduced level—rather than failing completely when some part of the system fails.
Just as you need to modify your software to make it more scalable, you will need to make changes to make your software more resilient. Like scalability, there are tremendous payoffs when you make your systems more resilient, because they stay running and teams aren't scrambling to fix problems.
There are many excellent resources that discuss the techniques that make services more resilient. (If your teams are writing cloud-native software in Go, "Cloud Native Go" is unsurprisingly a must-read.) These patterns center around how data flows through your services. For data coming into a service, you need to limit the amount of data to what can be processed in a reasonable amount of time. If too much comes in, load needs to be shed in order to respond to the remaining requests in a reasonable amount of time. When your service is requesting data from another service, it must be written to handle the inevitable errors and timeouts that will occur.
Cloud providers provide some tools to help with resiliency, too. There's overlap with scalability. If a microservice crashes due to a rare error, an autoscaler can launch a new copy. Autoscaling also allows your systems to absorb load rather than shed it. Other cloud provider tools help, too. When you use databases or data processing platforms managed by your cloud provider, you can quickly increase their resources if they need more CPU or storage.
Cloud providers also allow you to increase resiliency by spreading your services across regions. A region is a geographical area with one or more data centers, such as the East Coast of the United States or São Paulo, Brazil. Within a region, each data center is assigned to one of several availability zones. To ensure that a failure in a data center doesn't cause an outage for your company, it is recommended that you launch services across multiple availability zones. Following the principles of statelessness and treating your servers as livestock means that your system will continue to function even if a single availability zone or region goes down. And if you use a data store from a cloud provider, they can automatically replicate data across availability zones and even regions.
Another key aspect of cloud-native computing is that it is manageable. All of these components can be viewed from a UI or have their status queried via an API. Having an API to discover and modify the state of your environment means that you can write tools to this work in a repeatable way. It also means that you can describe the environment in a script and run that script to deploy, update, or delete your components. AWS provides a tool called CloudFormation to do this, but many companies use Terraform, a cross-platform tool from Hashicorp, to manage their environment.
Closely related to manageability is observability. Once you have multiple components running together, you want to understand what they are doing. You also want to know when something goes wrong. Even if your developers design for resiliency, your operations people still need to know about problems as soon as they happen to prevent the situation from getting worse. Amazon provides a service called CloudWatch to provide this functionality. CloudWatch collects data from AWS on how your application is running and metrics on how your applications are performing. Furthermore, your application's logs can be sent to Cloudwatch as well, so that you see the information from your code alongside the information captured by AWS.
In addition to observing your systems as they are running, it is also helpful to observe the API calls to your cloud provider that configure your system. These calls can tell you if systems are configured correctly, and can possibly detect malicious activity. AWS uses CloudTrail to report on API calls, Google has Cloud Audit Logs, while Azure's Monitor service tracks API calls as well as application performance.
Finally, you need to rely on automation to ensure consistency in your cloud environment. Automation ties all of our cloud native principles together. Scalability is possible because we automate the deployment of immutable infrastructure. Systems are more resilient when we can automatically restart them on failure or when they automatically fail over to a backup system when they detect a problem in a dependency. Automated management tools allow you to keep track of what is running, and automation allows you to find out when your observable systems are misbehaving.
There are more ways that automation enables cloud native software. When you are releasing new versions, you don't want to have a system administrator install software by hand. Instead, you should take advantage of deployment pipelines that automate the build, test, and deployment process, like AWS’ CodePipeline, Google’s Cloud Build, or Azure's Pipelines. Automation ensures consistency and allows you to do things like roll out a new version of your software to a limited subset of your servers to see if it functions correctly.
In addition to improving the deployment experience for cloud-native software, automation also helps with the management of your environment. You need to make sure that all of the components of your software are configured correctly. This includes things like validating access permissions, ensuring that only customer-facing applications are exposed to the public internet, or making sure that all of your cloud resources are properly tagged with information to identify which team is the owner. You might also want to implement cloud cost optimization measures like turning off components in a QA environment when the engineers are sleeping and turning them back on again when they return to work.
Going cloud native is worth it
As you’ve seen, the goal of building cloud native applications isn’t to be up to date on the latest buzzwords. Once you’ve followed these principles and redesigned your applications with a cloud native architecture, your company can produce more reliable software that better serves the needs of your internal teams and your customers.
For companies that are invested in a data center and older technologies, this isn’t going to be easy. As Tim Hartford said in his article on the transition from steam to electricity and the adoption of computer technology:
The thing about a revolutionary technology is that it changes everything - that's why we call it revolutionary. And changing everything takes time and imagination and courage - and sometimes just a lot of hard work.
Just as companies 100 years ago had to make changes to their infrastructure as they moved from steam to electricity, becoming cloud native means your developers and operations teams will need to make changes as they improve the scalability, resilience, maintainability, observability, and automation of your software and its environment. It requires changing some development patterns, and embracing a cloud native architecture by using the tools from cloud providers. However, the payoffs are remarkable. Welcome to the future.