How to build the right data governance strategy

Learn what data governance is, key components to consider in building a data governance model and best practices for enterprises.

Patrick Barch

September 8, 2023

A thoughtful data governance strategy is fundamental to helping companies organize and protect their data, while providing data users with accurate and easily discoverable information that can be used to serve their customers better.

The right data governance approach for your organization ensures data is secure, protected, accurate and accessible, equipping employees across the organization with centrally set standards and policies to use data responsibly while safeguarding the business. As more companies move to increase data sharing and self-serve analytics, a strong data governance policy produces consistent and trustworthy data that can be relied on for important business decisions.

What is data governance and why does it matter?

Data governance, a key part of data management, is the process of managing the collection, storage, security, availability and usage of data in alignment with internally set standards and governmental regulations. Data governance ensures that data is high quality, consistent and compliant across an organization. A well-governed data program is key to elevating data trust among users in an organization and reducing potential data security risks.

With the huge amount of data produced each day, about 5 quintillion bytes according to one estimate, organizations need to find ways to properly manage and secure their data assets to get value from data. Companies handling customer data hold a great responsibility to ensure compliance with regulatory requirements and privacy legislation while protecting their customers. And customers expect companies to keep their data safe with 87% saying they would not do business with a company if there were security concerns, according to McKinsey.

Despite the vast amounts of data before organizations today, understanding and utilizing the data for business value is a challenge. Clear policies for organizing and governing data are essential to keeping data consistent, high-quality and usable across an organization.

There are many benefits of data governance, including:

More accurate analytics
Greater regulatory compliance
Improved data quality
Lower costs
Increased access to data

Core components of a data governance strategy

When forming a data governance strategy or policy, there are key components businesses should evaluate for their unique organizational structures and business goals.

The capabilities data governance policies should address include:

Ownership structure

Companies should determine the right ownership structure for each data set or data product. Ownership means the owner is accountable for the data in a specific way. Determining a structure answers questions such as “Who controls the publication of the data?” or “Who fixes problems with the data?” For example, in federated data governance, the right ownership structure must be put in place for the federated teams, including the technical person who looks into a service when it goes down, the business person who can explain the meaning of each field in a data set and the individual accountable for the risk of data.

Metadata curation

Setting common standards for metadata curation, such as the fields required when data is cataloged, makes data sets easier to find, understand and access. At the same time, organizations should build flexibility into metadata curation as not all types of data require the same level of curation.

Data quality

Businesses should constantly be checking shared data for data quality issues, and then acting to notify the relevant stakeholders when a concern is found. Common quality standards for shared data means the data will go through schema conformance checks, completion checks as data moves from point A to point B and business data quality checks, such as ensuring FICO scores fall within a predefined range.

Lineage

Your downstream consumers need to understand how data moved from the source system to the state in which they’re using it. How you track lineage is an important concept, particularly in federated data management. The lineage of a data set helps consumers understand, for example, which fields truly are from the source versus the result of transformations or enrichments of the data. In regulatory use cases, one must often trace the full lineage of data from the point of creation all the way to the regulatory report it is currently participating in.

Data protection

Organizations should determine how to keep all data protected. This impacts how federated teams can protect sensitive fields when they publish data to a shared environment. Businesses must consider how to enable self-service capabilities for individual teams to protect data as they share it. They should also be able to help downstream consumers understand the right entitlement to request in order to use a particular data set.

An overview of data governance models

A data governance operating model includes the frameworks and processes used to organize a company around its data governance activities. The operating model helps define the roles and responsibilities for data across the enterprise and different lines of business.

Historically, organizations have chosen centralized or decentralized data governance policies to guide their governance efforts. While these data governance models could still work for an organization today, more companies are moving to the cloud and seeking to take advantage of the benefits of cloud computing (such as greater scalability and storage). As a result, most businesses will find the traditional models limiting and should move toward a third model: federated data governance.

Federating data governance can help strike a balance between speed and enforcement of standards; reducing bottlenecks that limit access to valuable data and empowering users to own and manage that data in a compliant way.

Centralized data governance

In a centralized data governance model, a core enterprise data team is responsible for all the governance activities concerning the company’s most important shared data. This central team, for example, would select the central repository for everyone in the enterprise to use such as Amazon Web Services, Snowflake or Azure.

The benefit of a centralized data governance model is the ability to control and manage risk well across a business as a team of data experts and engineers use a central tool to govern data enterprise-wide. However, this model can present challenges at scale, leading to bottlenecks in accessing and acting on valuable data that is managed by a central team. Furthermore, when changes to datasets occur, as they often do, significant manual work is required to inform everyone using the data of the changes and to update the relevant systems.

Decentralized data governance

A decentralized model is one in which business teams are accountable for performing all data management activities. Each line of business can select its own platforms and tools for storing and managing data.

A decentralized model allows individual teams to work at their own pace and meet data needs much faster. However, without an adherence to centrally defined standards and processes, a company can run into risk or privacy challenges due to inconsistencies in data across lines of business and data stakeholders. In the decentralized model, teams can move quickly but may not gain all the value out of their data sets or lack the consistency that they might have if they were managed centrally.

Federated data governance: Centralized data governance with decentralized data ownership

A third model strikes a balance between the centralized and decentralized approaches. Sometimes called federated data governance, this model involves a central team that sets the rules and defines policies, but business teams are empowered to execute against those policies. This model combines the approaches by federating data management responsibilities across lines of business while enforcing a set of enterprise-wide standards for governance. There is shared responsibility for data between the data governance team and individual business owners.

Federated data governance is a key tenet of data mesh, an architectural concept that seeks to decentralize data responsibility from a central team and distribute it to an organization’s business domains, or lines of business, so that companies can scale their data efforts with speed and agility. Data mesh is an important approach that has emerged to address the challenge of scaling data in an environment of exponentially growing data volumes and complexity.

In the federated data governance model, each business unit moves independently at its own speed, which is an important benefit today as more teams across the organization demand faster access to data to make important business decisions. At the same time, there are certain aspects of data governance that must stay with a central group in order to ensure critical standards are met. Businesses can ensure proper controls are in place where needed through a central body while allowing individual teams to scale quickly to their business needs. Later on, we’ll share more about how Capital One implemented this approach successfully.

Companies today are moving toward an interconnected ecosystem where they can learn from patterns across all data, feed information into machine learning models to understand customer behavior, and provide more personalized experiences and privacy in data viewing and handling. The key to making this model work for your business is to automate adherence to standards through centralized tooling.

Steps to build a federated data governance strategy

Early in Capital One's data journey to go all-in on the cloud, we realized the centralized data governance model we were using with on-premise systems would not scale in the cloud. We also knew a fully decentralized model would be time consuming, and inefficient and potentially costly to manage. We needed a third way that we came to know as federated data governance.

We needed technology that supported flexibility and self service for each line of business but within the guardrails of centrally set standards. The key was centralized policies and tooling that fulfilled our needs for rigorous governance, which we built through the following steps.

Step 1: Define common standards

We defined common standards for the five data areas: ownership, metadata, quality, lineage and protection. We addressed important processes such as capturing lineage, protecting sensitive data and governing access while using as few policies and rules as possible.

Step 2: Build a central platform

We built these policies into a central platform and set of tools with automated workflows. As a result, everyone from the data governance team to the data consumers knew they were adhering to enterprise standards as long as they were using the central platform. Without this platform, data governance policies were often confusing and difficult for individual teams to follow.

Step 3: Create a workflow process to follow

We created user experiences (e.g., data producer or risk manager) within those platforms based on the jobs that needed to be performed. We designed a process for teams to follow that was based on the way they thought about doing their jobs. A central workflow guided teams through the process in a configuration-based format. Data stakeholders no longer needed to turn to five or six different tools to get their work done. Everything they needed was in one place.

Best practices for well-governed data

From our experience moving our data workloads to the cloud and building a well-managed data ecosystem, we came away with lessons about governing data that we believe apply to many businesses today seeking to optimize their data usage and scale in the cloud in a well-managed way.

Build with flexibility in mind

One of the biggest lessons we learned at Capital One was that as a company we needed to adhere to the same set of standards, but we also needed to build in flexibility for the various use cases and edge cases. We realized each line of business required flexibility to define policies and processes that worked for them or else we increased the likelihood of data stakeholders working outside the tools we built.

Not all data is created equal

We adopted a mentality of “sloped governance based on risk,” meaning not all data is created equal and needs individualized treatment. Some data sets will require less governance and others will need more rigorous standards, such as data that goes into a regulatory report.

Keep monitoring and adjusting

In our experience, balancing governance with enablement is a practice that requires ongoing monitoring and adjustment. As you continuously evaluate your data management, you will get feedback from users such as when a process is too onerous. Make sure to adjust and tweak your approach to address the patterns you see developing.

Empower through self-service and automation

A central platform with built-in governance enables you to keep policies consistent across the business and apply automation to key points of friction.

Building a data governance model for your enterprise

No two data governance strategies will look the same. One of the key things to figure out is applying a data governance model that considers the unique aspects of your company. Each company will have different standards for what constitutes the riskiest and least risky data.

The right data governance approach, flexible to your organization's needs, will unify your teams around a way to govern your data that instills greater data trust, quality, and accessibility throughout your enterprise.

Patrick Barch, Senior Director of Product Management for Capital One Software

Patrick Barch is a Senior Director of Product Management for Capital One Software. In this role, he is responsible for delivering Capital One Slingshot, which enables business teams to easily adopt and manage Snowflake while meeting governance needs. He spent the last 4 years at Capital One building data platform experiences enabling Analyst and Scientist teams to more easily find, evaluate, use, and collaborate on data products.