Essential Cloud Governance Disciplines and Best Practices


If implemented appropriately, good cloud governance enables an organization to fully realize the business benefits of cloud computing while managing its risks associated with this new operational paradigm. I am going to share some insights for applying best practices to a cloud governance strategy based on my program work at Capital One, the first U.S. bank to exit legacy data centers and go all-in on the public cloud.

What is Cloud Governance?

There are various simplified definitions for cloud governance. Some define it as a set of rules that must be complied with. Others define it as controls to manage access, budgets and cloud compliance, or, a way to create rules, monitoring, and adjusting where necessary to achieve business objectives. 

Consider these definitions:

NIST definition of Cloud Computing (i.e., Cloud): “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”

Governance: “the act or process of governing or overseeing the control and direction of something (such as a country or an organization)”.

Based on the above definitions and my experience at Capital One, here is how I summarize cloud governance: 

Cloud governance is the overall way in which an organization oversees the control and direction (e.g. manages the use) of cloud services and resources from cloud service providers, and the processes and systems for doing so.

Frequently the term “cloud governance” is used interchangeably with corporate policies, standards, and procedures (PSPs) that pertain to cloud computing operations. In some situations this conflation of concepts is useful to simplify the concept of cloud governance for organizations early in their cloud journey. 

For major organizations in regulated industries, it is important to separate the concepts of cloud governance and PSPs. This is because cloud governance is much larger in scope; it embodies the systems and cloud platform technology specific implementations that prevent, identify, and correct deviations from the PSP-defined requirements. For regulated organizations operating in the cloud, the interpretation of controls and their proper implementation method really matter. This is the only way that an organization can ensure that they are meeting the spirit of the PSP requirements which are critical for managing operational and security risk. 

Good cloud governance should be based on cloud native thinking in order to appropriately manage cloud computing risks and achieve its benefits. 

What is the Benefit of Cloud Governance?

Effective cloud governance, based on a well-defined cloud governance framework, helps your organization fully realize the benefits of the cloud while holistically managing costs, and operational and security risks. The need for governance, i.e. oversight and direction, is even more important in the cloud as physical constraints of infrastructure capacity, capability, configuration, and speed are removed from application teams. 

Cloud computing solutions provide nearly instant compute and data storage capabilities on demand, with virtually infinite capacity. They are globally accessible, orders of magnitude lower cost than company-owned infrastructure, and are billed only as additional resources are consumed. All of the resources for an entire data center can be deleted and rebuilt repeatedly with one line of code executed automatically via an unattended process, anywhere in the world—at the speed of light. 

This is a new paradigm requiring a culture change. Traditional approaches to information technology infrastructure and compliance are not effective when applied to the cloud because it is as different from datacenter architecture as the concepts of batch computing vs. real-time computing are from one another. The cloud is real-time. Cloud governance must encompass the processes for defining and operating new and tailored policies, standards, and procedures via automated systems to be able to manage the speed and scale that cloud computing provides. 

Read about cloud governance as a cloud computing trend in this Capital One fireside chat with a guest from Forrester.

Establishing a Cloud Governance Framework

There are several cloud governance frameworks available. In my experience, the AWS 5 Pillars of a Well-Architecture Framework represents the most “cloud centric” disciplines and are in the recommended order: 

  • Security
  • Cost Optimization
  • Operational Excellence
  • Reliability
  • Performance Efficiency 

These disciplines and best practices should be prioritized based on your organization's unique set of business objectives, risk profile and risk tolerance, and cloud maturity. Ideally, all of the framework elements should be incorporated to some degree from the beginning of your cloud governance program. In the next section, I’ll discuss how to apply important components of the best practices of this particular framework to form a holistic cloud governance strategy.

Just as it takes an athlete years of training and practice to hone the skills and abilities necessary to compete at an elite level, it takes time to develop and refine all of the standards and automation capabilities necessary to become proficient with cloud governance best practices. In other words, it's best to crawl, walk, then run. For the ‘crawl’ stage, the makeup of your cloud governance foundation and the capabilities that are needed for success depend upon your organization’s needs. Specifically, it is important to balance business objectives for using the cloud, DevOps and cloud native maturity, and its operational priorities (reliability, security, feature deployment speed, cost, etc.) while achieving system Confidentiality, Integrity, and Availability.

Applying Best Practices to a Cloud Governance Strategy

Cloud governance best practices begin with the Cloud Service Provider’s (CSP) shared responsibilities model which defines what your organization’s responsibilities are for protecting your resources for each service you use.  

The way in which cloud governance best practices are implemented matters. If your organization implements 10 different patterns for achieving the same operational result, the ability to automate that best practice will be limited. Standardization of well-defined cloud infrastructure, configuration patterns, and controls is required to automate the best practices of your cloud governance program to keep up with the cloud’s speed and scale. 

Capital One created the Cloud Custodian automation tool to assist with the monitoring and remediation of both cloud resource configuration and custodial actions through a standard set of enterprise policies. Last year, Capital One donated the Cloud Custodian open source project to the Cloud Native Computing Foundation.

Security

Don’t presume that all services offered by a CSP have consistent security features. While the responsibility to prevent unauthorized events such as:

  • data sharing, 
  • internet access to the resource
  • other tenant access to the resource

is shared by the customer and CSP, if the customer chooses to use a particular service then it has the responsibilities to configure it to meet their requirements as defined by the service specific customer responsibilities. CSPs may not provide fine-grained access control, views of administrative activity or data access logging, or even encryption of data at rest for newly released services. There may, or may not, be a set of common capabilities implemented across a group of services because typically each service is built by an agile product team that launches a series of Minimum Viable Products (MVPs) to serve the needs of a target customer personae, and then matures each product with additional features over time. This means that your organization may not be able to fulfill its fundamental responsibilities, i.e. meet minimum requirements or standards, for safely configuring and using all services offered by a CSP.

The place to start is to assess the cloud platform framework components, with respect to Identity and Access Management and networking, for workload separation and isolation, and configuration options against your organizational standards. Then assess each service that may be deployed to meet the requirements baseline to define what configuration parameters and controls are required to protect your application workloads with respect to your data classification categories. These configurations and controls should be automatically monitored and maintained through the use of compliance tools.

Cost (Optimization)

Begin from day 1 with an effective implementation of basic cost management controls and tools, and optimize later in the cloud governance maturation journey. Even though computing and storage cost rates are lower in the cloud, you pay for what you use. As physical constraints of infrastructure capacity and speed of availability don’t exist in the cloud, many organizations have been shocked by massive cloud bills that are driven by resource sprawl. This can include but is not limited to:

  • Continuously running development and test environments
  • Large scale evaluation and testing infrastructure not deleted after use
  • Endless backup and replicated copies of unneeded and unused data
  • Virtual machine, database, and other snapshots
  • Over provisioned resources

The availability of virtually full instant snapshots and unlimited capacity, combined with the scale of the cloud, is likely to lead to cloud resource usage that dramatically exceeds expected costs. This is because without automated custodial tools, actual cloud resource consumption will be much higher than planned. At the end of the day, without proper cloud cost optimization you lose out on key advantages the cloud provided to begin with. 

Operational Excellence

The operational excellence best practice recommended at the beginning of a cloud governance program is to perform all cloud infrastructure operations as code, for all environments. Infrastructure as code facilitates cloud native thinking and enables consistent, accurate, and compliant creation of infrastructure resources, repeatedly. 

It’s critical to understand the characteristics and volumes for workloads and to ensure that service quotas (rate limits) and network topology are configured sufficiently to accommodate them. In addition to the workload itself, the workload characteristics need to account for the additional usage and rates of your automated monitoring and other capabilities operating with the workload.

Performance (Efficiency)

In terms of performance, consider standardizing on the use of Platform-as-a-Service (PaaS, also known as “managed services”) for application workloads instead of managing the provisioning and scaling of individual compute instances. The use of managed services transfers more of the responsibilities to the cloud platform provider at the expense of losing some access, control and transparency of the underlying compute and storage systems. Your assessment of the managed service should indicate if there is sufficient configuration control and insight in order to achieve your organization's security and operational requirements. 

Reliability

The workload architecture should be designed to not only prevent failure scenarios, but also to automatically detect and mitigate failures or changes in workload demand. Data redundancy, fault detection, and automatic scaling capabilities of cloud services vary from no capability to global capability. Managed cloud services typically have inherent within-region data redundancy and automatic capacity scaling capabilities, and some automatically replicate data across regions and scale workload processing capacity globally. The Service Level Agreement (SLA) terms for the services used should align with the resiliency requirements for the workload. For worloads operating on groups of individual compute and storage resources (Infrastructure-as-a-Service), the resiliency factors previously identified will need to be implemented appropriately to support the workload’s reliability requirements.

Final Thoughts on Cloud Governance Strategy

Establishing a cloud governance strategy using the above insights is key to minimizing risk and optimizing organizational efficiency. As your cloud governance program matures, your approach to each best practice will likely evolve to meet the changing needs of your business. Nevertheless, the value of a strong framework will provide lasting benefits for your company and customers. 


David Leigh, Senior Risk Technical Program Manager, Cloud Platform Security

David Leigh is a Senior Risk Technical Program Manager in Cloud Platform Security at Capital One. For the past four years he’s focused on implementing and refining Capital One’s enterprise Cloud Governance program. David’s experience with cloud service assessment, controls implementation, monitoring, risk management and enterprise reporting span the three major cloud platforms. Prior to joining Capital One, David was president and co-founder of Rofori Corporation / DEFCON CYBERTM, a SaaS solution for continuously measuring an organization’s Cybersecurity Posture, and performed product and program manager roles for information technology systems across different industries. You can connect with David on Linkedin (https://www.linkedin.com/in/david-leigh-123a033a).


DISCLOSURE STATEMENT: © 2021 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.

Related Content