Data management: A modern, integrated approach
Build an effective data management strategy that considers three distinct experiences: data, governance and the customer.
June 1, 2022
Contributed by Salim Syed, Vice President and Head of Engineering, and Patrick Barch, Senior Director of Product Management for Capital One Software.
Businesses today produce and consume unprecedented amounts of data with rapid advances in the cloud, artificial intelligence and machine learning. At the same time, the ability to make sense of great volumes of data from an endless number of sources has become paramount to a company’s long-term success.
Yet, businesses can struggle to gain real value from their data with much of their data lying dormant. Up to 73% of data in an enterprise goes unused for analytics, according to Forrester. This is where data management plays a critical role. With a data management strategy in place, companies can improve data quality and access to data across an organization.
At the same time, traditional approaches to data management were not designed to handle advances in technology such as the cloud. For example, the incredible speed with which large amounts of data are processed in the cloud means users can analyze data much faster, but enterprises that still depend on a central team to manage data will run into bottlenecks that diminish the advantages of the cloud. The volume and complexity of data today require a new way to approach data management: as an integrated, holistic experience that considers the needs of various data stakeholders throughout an organization and empowers them to work together seamlessly. So what exactly is data management and what best practices should companies consider to achieve an effective, integrated strategy?
What is data management?
Data management is the execution of the policies, procedures, architectures and tools for using data in the most effective way for a business. Data management ensures that data is trustworthy and accessible across the organization for its entire lifecycle while enabling the safe, controlled and secure storage of a data asset. The goals of data management are to improve data processes and increase the business value of an organization’s data assets.
According to Experian’s 2021 Global Data Management report, organizations believe 32% of their data is inaccurate, and 55% of business leaders have trouble trusting their data assets. This reinforces the importance of a thoughtful data management approach. Without reliable data, executives can lose faith quickly in data analyses performed in the organization. Employees can spend up to 50% of their time looking for data and correcting errors on their own, many times without alerting the original department that houses the faulty data set.
Data management makes sure each piece of data across the organization is accurate, available, relevant and secure. Without a way to manage data effectively, organizations risk missed business opportunities, wasted resources and unnecessary costs.
There are many benefits to data management for an enterprise, including:
- A more productive workforce
- Data consistency across the enterprise
- Quicker delivery of new products and services
- Better access to data for everyone
- Greater control over infrastructure and processing costs
What’s included in data management?
There are many facets to data management with the goal of ensuring high data quality, access and security. Common aspects of data management include:
- Data governance establishes common standards and policies for data protection that align with governmental regulations.
- Data architecture describes the structure of an organization’s data assets and how the data moves from one point in the system to another.
- Date integration involves combining data from multiple sources and presenting them in a unified view.
- Data stewardship is the oversight practices that ensure the quality and fitness of data assets for business users.
- Data quality measures the fitness of a data asset for its intended purpose through factors such as consistency, accuracy and timeliness.
- Data lineage gives visibility into the way data flows through an organization including its origin, transformations and destination points.
- Data remediation is the organizing, cleansing, moving, deleting and archiving of data to ensure data quality and the correction of any errors or mistakes.
- Sensitive data scan identifies sensitive and confidential data, where it’s located and how secure it is.
- Extract, transform, load (ETL) is a set of processes for moving data from its original source to a data warehouse.
- Master data management ensures that a shared set of critical data exists for an organization that is accurate and up to date.
- Data security is the practice of protecting data from harmful, unwanted and corrupting forces and breaches.
- Metadata management is an agreement on how to identify, classify and describe data to improve how users access and use the data.
- Data lifecycle management ensures a data asset is managed correctly from the time of its creation to archiving.
How data management has evolved
Businesses used to manage their assets in a straightforward way because there simply was not as much data or variation in data types. A centralized team handled company-wide requests for data while ensuring that the data was following set standards for privacy, safety and regulation. Most of the assets were in the form of structured data from transactional processes that were predictable in rate and volume and stored in data warehouses.
When big data became popular in the 2000s, unstructured and semi-structured data began to grow in prominence in the form of assets such as social media posts, images, text messages and emails. These various forms of big data introduced complexity and challenges to data management and analysis, leading to the need for data lakes, which allowed for the storage of data in its raw form. The rise of data that was streaming continuously and captured in real time added to the challenges of data management.
Today, more companies are moving into the cloud, attracted by the promise of new business efficiencies, cost benefits and market advantages. That adoption has led to data management practices that try to address the strain businesses feel as they grapple with the volume and speed of data made possible by the cloud. Many organizations are trying to find the right balance between the traditional centralized approach to data management and new approaches that recognize the need for decentralized ownership of data across the enterprise.
Trends shaping data management
The growing number of data types and the diversity of sources, in addition to the need to provide differentiated customer experiences are driving key trends in data management.
Increased demand from business teams for immediate access to all of the data they need is leading to the rise of self-serve analytics. This type of tooling empowers non-technical users to get to the data and analytics they’re seeking faster while freeing up data teams to spend more time on new projects that add value to the business. An example would be a centralized tool for data governance that hides the complexity of the processes and steps involved while making it easy and transparent to follow company-wide standards.
One of the biggest advantages of cloud adoption is cloud elasticity, which is the ability to add or reduce computing resources as needed. With an on-premises architecture, the storage and compute were tied together and had to be scaled together, but the costs were fixed. Now with infinitely scalable computing resources in the cloud paired with streaming volumes of data, costs can skyrocket and quickly get out of hand. This issue has put the need for operational efficiency at the heart of a modern data management strategy.
Decentralized vs. centralized data management
Companies a decade ago employed a fully centralized model of data management, where a central team managed data for the entire enterprise. But today, an avalanche of data from any number of different sources has become too much for a central team to manage data effectively. Otherwise, the data team risks becoming a bottleneck for the company. More companies are experimenting with a decentralized approach to data management in which the responsibility for and management of data is dispersed among business lines. At the same time, organizations still need to enforce common standards around data protections for the sake of handling data properly and mitigating risks. Companies today tackling data management must find the right balance between centralization and decentralization.
Data as driving business value
Data is now a key player in creating business value. The focus in data management has shifted from simply managing data to deriving value from the data. Often viewed synonymously with data governance, data management used to mainly mean fulfilling compliance and regulatory needs. But as businesses move to the cloud, exploding data sets mean data management becomes a means to getting the most benefit out of your data by understanding what it is, who owns it and what that means for business outcomes.
Integrated data management: Building an effective data management strategy
As the old models of data management become less relevant, new approaches to data management have emerged to try to answer the challenges that organizations face.
Data has proliferated all aspects of an organization’s business practices and today’s data management experience is no longer in the hands of a single team. In this diverse data landscape, more companies see the value of moving their data management toward an integrated data management experience that empowers the various data stakeholders present in an organization to manage their own data while respecting individual data priorities. In such an experience, various groups of people within an organization with their own individual data priorities can work together seamlessly.
For example, the data consumers, or data analysts and scientists, can get quick access to the data they need; the application owners can publish their data in a well-managed way; and the risk and infrastructure managers can seek ways to keep the data secure, running and cost efficient. One way this integrated experience can be accomplished is through a workflow-based model of data management where back-end processes are automated to make it easy to create a data set, grant access and build a data pipeline. In an integrated data management model, an organization can also set self-service tooling to enforce company standards for data.
In this integrated approach, data management becomes a collection of experiences, including the data, customer and governance experiences. The following best practices can help you build an effective data management strategy that takes into account each one.
Identify points of integration
Data management today isn’t about a single, central group that handles the data for a business. Data stakeholders are now in every line of business with their own data needs and journeys. There are many opportunities for miscommunication between these groups along the way if each is managing their own data independently. Data practitioners should handle the management of data as multiple, integrated experiences that work in concert with each other to make up a data management practice. To create this integrated experience, businesses should identify the points of integration between these groups and make sure to have a plan to connect them.
Federate data ownership
Data mesh is an architectural concept that distributes data ownership across different lines of business, or domains. The concept champions the idea of decentralizing the responsibility for data from a central data team to the domain teams that produce the data. In this model, centralized data governance standards would still apply across the organization. But ensuring data compliance would lie with the local teams that can apply these standards in the most appropriate ways for their areas.
This type of data ownership, sometimes called federated ownership, gives teams the autonomy to make the best decisions for their own data but within organization-wide data standards. Data mesh attempts to help organizations answer the challenge of scaling their data management and data analytics in line with the complexity and amount of data present in their systems.
Treat data as a product
Data today no longer operates in a silo and data teams’ tasks include providing data assets that satisfy the needs of different lines of business, such as for building new products or making business strategy decisions. In this way, data should be viewed not only as an asset to be collected but as a product that serves the “customers,” or data consumers within a company. Data practitioners should treat their internal users with the same care they would provide to external customers. This means applying best practices from product management, like building in sprints, and user-centered design. You should also understand the data user’s particular needs, concerns, and customer journey. In treating data as a product, the data team’s responsibilities include providing data that supports good decision making and applying an SLA to the provided data.
Recognize not all data is created equal
A data set that’s being used in a financial report has a different level of risk and importance than a set an analyst may upload to a sandbox environment. All data is not created equal and your data management strategy should reflect that reality. A sloped, tiered governance strategy that accounts for these different levels of risk will create the flexibility needed to address the many scenarios that can arise. For example, you may want to reserve your most rigorous metadata curation policy to your most important data, such as in regulatory use cases, while less risky data could require only 10% of the metadata curation. Applying a differential policy in this way will keep governance from becoming an unwelcome bottleneck, allowing people to move quickly on data sets that are less risky.
Make governance easy
Taking the time to bake your policies into an easy-to-use tool will go a long way in ensuring good data management practices across your business. This way, users can trust that as long as they follow the workflow and answer the on-screen questions, they will adhere to the data governance processes. At the same time, the risk management team can rest knowing that as long as the user is going through the central tool where the policy is baked in, the company’s data is operating within the set data policies.
What’s next in data management
As more companies see increased rates of data production and consumption, the landscape of data management will continue to evolve to take advantage of modern technologies while aligning with changing regulations and standards. For example, as automation and machine learning shift into high gear in data management, humans will get involved only to validate decisions. Thanks to cloud infrastructure, there will likely be increased data sharing between companies, which could result in new standards for joint data management and data governance. Lastly, entities will look to technology to make adhering to legislation easier across the company to further enhance data privacy.
Each organization must consider its own data goals to come up with the data management architecture, strategy and tools that most benefit your business. Putting the work now into a thoughtful, maintainable data management practice that recognizes the diversity of data experiences in your organization will position your business for data success.