Elements of effective cloud compliance

Understanding compliance and key best practices for operating successfully in the cloud.

As the world adopts cloud-based technologies, whether as a consumer or as a business, there exists a set of threats and risks that must be addressed so cloud adoption and all its associated benefits can be realized. Cloud compliance can provide protection from these vulnerabilities that can expose private and protected information while we leverage the advantages of the cloud. 

What is cloud compliance?

Cloud compliance is a set of systematic operations that ensure a business is run in a compliant way, while at the same time protecting an organization’s resources, be it network, compute, or storage. Cloud compliance maps to a wide range of regulations and best practices that organizations are expected to follow while using systems and services through cloud capabilities. The International Organization for Standardization (ISO) is a well established entity defining a majority of the standard operating procedures and regulations in the use of cloud. There are many regulations with which an organization must maintain compliance. The following are some of the most widely referenced:

  • GDPR
  • SoX
  • FedRAMP

The steps to successful cloud adoption must include a foundational layer of compliance that is set to operate in accordance with regulatory and risk management aspects. A collaborative model of technology, process,and people are at the root of such an ecosystem. Compliance is represented as a set of policies or rules that constantly consume the state of machine data, analyze for risks, and report on vulnerabilities. Setting up a dedicated organization chartered with implementing cloud compliance standards can help to influence the business and engineering community towards a change in behavior and adopt best practices to effectively leverage cloud computing. 

Elements of cloud compliance framework

There are 5 building blocks for a well-rounded cloud compliance framework: 

  • Service assessment and vulnerability analysis
  • Collecting machine data & machine events
  • Developing controls & policies
  • Processing machine events against policies 
  • Identifying non-compliance and taking action to remediate 

The majority of cloud service providers support a mechanism in which machine events are shared as a stream or as APIs that can be consumed. Once machine events are available it is now appropriate to apply rules on top to evaluate for compliance. This process is time sensitive and at the same time has to process a large volume of machine events. It demands a high performance platform that can be trusted to not have failures or errors in evaluation with upward of 99.9999% accuracy, availability, and an uptime 24/7. 

There are many proven patterns to divide and conquer such huge volumes of machine events at high speeds. It is also important to set guardrails to ensure that even the most granular change to any cloud resource is taken into consideration with a broader context of its ecosystem, like account, VPC, and IAM, and not just the event itself in isolation. 

As engineers and product owners tend to leverage newer cloud-based tools and services, the challenge will be to ensure they are ready to roll out controls or policies that will help the engineers to leverage the tool efficiently and responsibly. This implies that cloud governance has established a strong relationship and partnership with cloud compliance, cloud providers, regulators, cyber and audit groups who can collectively look ahead to a set of new cloud capabilities that can be adopted into the organization, but identify what vulnerabilities it would possess or pose by integrating into existing resources and applications. 

The following five elements are critical to implement successfully for a holistic cloud compliance strategy that provides visibility and protection for an organization.

Service assessment and vulnerability analysis

Cloud providers release new capabilities and services very often to keep up with competition and provide differentiated value to their customers. In addition, cloud providers also change, deprecating existing APIs and services. However as a consumer, it is important to assess new services as they are released and existing services periodically for changes. These assessments (service assessments) are meant to identify vulnerabilities associated with the native service. The scope of this assessment will also include design and implementation of these cloud native services specific to your organization and your organization’s best practices. 

Service assessments need strong cloud architects, risk and compliance subject-matter-experts (SMEs), engineers and product owners coming together to evaluate various aspects of new and existing cloud native services.  Post assessment, a report is shared that clearly describes the risk (if any), recommends a set of controls to be implemented before a broader adoption, and suggests the collection of service usage reports and service compliance reports to be regularly reviewed to address risks associated with such a service adoption. 

To ensure these assessments happen in an unbiased way, an independent group or a department like Cyber is charted to manage and report. 

Collecting machine data and machine events

Identifying vulnerabilities is a time sensitive opportunity and the value depreciates as time lapses. In many cases though, the window of opportunity to react is much longer, helping cloud compliance platforms to leverage a wide variety of data sources. In general, there are three main options for machine data:

  • Cloud native resources (ex. CloudTrial)
  • Tool-based resources (ex. ServiceNow)
  • Direct resources (ex. APIs)

Direct resources via APIs are typically the fastest and easiest to process, but get expensive as the volume of the API calls go up. The cost of processing all machine data as events from API sources will demand massive and resilient compliance platforms. There needs to be a balance of how much real time events could be processed versus batching data for evaluation of vulnerabilities. 

Collection of machine data and classification and categorization of data based on resource type or criticality/risk or vulnerability that will help distribute the processing is fundamental. A key differentiator could be to build a data layer that is always available and democratized by which multiple lines of business or departments could leverage to build custom algorithms to identify risk. As data gets democratized across the organization it is important to have clear disclosures associated with each stream or lake of data. These disclosures include age of data, source of data, signatures, and risk category. 

Further, as the cloud footprint grows it will become necessary to establish a group or a department completely focused on managing the data layer of machine events. Many organizations that have large footprints on the cloud also provide learning platforms around machine data to help build intelligence on top of applying policies or rules that can start to bring predictability of risk.

Developing controls and policies

It’s tricky as to how fast does the dark web learn to leverage resource-based vulnerabilities against organizations or organizations learn to develop policies and rules that will protect from such vulnerabilities. It’s all about the length of the learning curve. Cloud SMEs play a very big role here in assessing new service offerings from cloud providers, not just broadly but deeply in the context of how a business wants to leverage new cloud-based resources or products. Assessments are generally led by a group of SMEs who understand risk (technical, financial, and operational), coming together to score a particular cloud-based tool or resource. It then creates a risk score that expresses the criticality as Low/Medium/High/Critical. These assessments are not periodic, but happen as a standard process throughout the year and includes re-assessments of existing services to ensure changes to service APIs or deprecations. 

Much needs to be sorted out on how these policies are released based on how many cloud accounts need to be provisioned. For example: 

  • Is it to be done at the same time or progressively? 
  • How many applications are impacted based on the release of these policies? 
  • Does the velocity of product development get affected by such a release and does the policies have inbuilt actions that may vary by environments? 

It is evident that release management aspects of these policies have the potential to influence and affect the much larger organization both positively and negatively. Negatively, though, it would be more secure, because these policies will restrict the number of patterns in which cloud-based resources or services are leveraged, designed and integrated with business products. As a result, this would add more work for the application teams to retrospectively modify existing and live applications and/or make changes to upcoming designs and implementations. 

End user impact analysis is a necessary wing of policy development that attempts to keep the impact of policies to a much lower extent by automating the remediation on behalf of the application teams. However, this will require a trusted relationship between application teams and cloud compliance organizations. In certain situations, cloud compliance teams will not have full visibility into the rationale or the implications of a particular design of a component that uses a cloud resource. In such scenarios, it is ideal to report non-compliance as a notification and provide a window to the application teams to own the remediation. To fully understand the impact on the end user or applications themselves, it is important to have the ability to create test scenarios in which resources created, evaluated, and actions taken to remediate are all recorded and further analyzed to clearly identify impacts that the release policies would cause. This part of controls testing or policy testing is quite a riddle and many times exhaustive, but very rewarding. 

Processing machine events against policies 

Controls (or policies) are developed as: 

  • detective controls
  • corrective controls
  • preventive controls 

However, preventative controls are the most effective way to ensure compliance to begin with, but at the same time preventative controls are complex to implement. Cloud compliance solutions are specifically designed to address each of the above categories of controls. Controls are either developed as batch (near real time) and also event based or API based. The goal here is to be able to get as close to the event as possible. Using CloudTrial data or any other cloud native logs is the simplest source for machine events, while it is important to notice that cloud native logs also have a delay between the actual event and the time it would make that event available in the log. 

Controls are composed of one to many policies applied correspondingly against different machine events originating across regions, environments and data classifications. Policies are rules that evaluate machine data against a set of conditions which have a boolean outcome. This implies that a resource can only be compliant or non-compliant to the scope of a control objective. The outcome is determined based on a wide range of simple to complex sets of rules that will evaluate machine events from multiple perspectives of data, identity, access and visibility. The process of evaluation and inference will consequently lead to identifying non-compliance resources, enable visibility thru various channels of notification and generally suggest a pattern-based remediation. 

Identifying non-compliance and taking action to remediate

All types of controls are essential to have one or many ways to report on the state of compliance. These reports can be further commoditized and customized, grouped and distributed through multiple channels like email notifications, compliance dashboards, operational reports, and risk state data. As an organization's footprint increases on cloud, the set of machine events could also grow significantly making the reports more complex and harder to interpret. Majority of the focus would be to have reports that clearly show non-compliance by risk category like regulatory, operational, or security. These reports can also include non-compliance listed by resource type and criticality represented as high, medium, or low. Divisional and line-of-business (LoB) dashboards are helpful to drive prioritization conversation with LoBs and executive decisions, encouraging business and technology groups to come together on risk management and risk remediation. 

Certain channels used to report non-compliance, like email notifications and Slack notifications, gradually become ineffective due to volumes growing higher each day. It is equally hard for developers to watch every notification that comes their way and help drive quick remediation. Policies that provide these notifications need product owners and product managers who come with a good experience in driving user empathy based feature developments, carving out effective and outcome driven notifications which are self-intuitive. These notifications need to be supported with clear steps to remediate, expressed as instructions that will help end users in a faster and consistent way of remediation. Compliance solutions can go one step further in assigning incident tickets to applications teams that have default remediation windows and support automated escalation processes. 

Cloud compliance is a responsibility of every developer

Technology, architecture, operations, risk, regulatory compliance, and governance groups coming together to set, manage, and socialize security standards, provide continuous support and guidance, and create a transparent process for new services adoption. The result is a collaborative and supported compliance approach across the organization, with clear intent, roles, and responsibilities. 

As organizations move towards being fully on cloud, compliance must play an integral role. Defining a cloud compliance framework within an organization will go through a journey of maturity and learning to balance roles, responsibilities, and accountability. However, the standards outlined in this post should be helpful in successfully implementing and managing an effective cloud compliance program. As this domain evolves further over time and with continued innovation, cloud compliance as a practice will become a fully operational entity for every consumer. 

Goutham Kadhaba, Director of Software Engineering, Enterprise Cloud Controls

Goutham Kadhaba is a Director of Technology at Capital One leading transformational work on Enterprise Controls & Cloud Compliance. He is focused on the development of automated and preventive controls to improve end user experience at the same time helping better the compliance posture. Goutham has also led the modernization of Card Imaging and Card Payment Processing platforms at Capital One. Prior to Capital One, Goutham led development of microservices and streaming based platforms across the banking & financial services industry.

Related Content