4 lessons from securing data across the enterprise
From cloud migration to data enablement, here’s what we’ve learned about tokenization at scale.
Capital One was founded on a simple but powerful idea: data and technology could change the way banking works. Long before “data-driven” became a buzzword, using data to run our business was part of our DNA—and that mindset hasn’t changed. Today, our data strategy remains central to solving industry challenges and building products, services and experiences that make a real impact.
In 2012, guided by the belief that real-time data at scale, AI and machine learning and the power of the cloud would fuel innovation and deliver personalized customer experiences, we began a bold transformation. Our goal was to operate like the bank a technology company would build.
We grew a world-class technology workforce—which consists of more than 14,000 technologists today, most of them engineers. As part of this journey, we developed the frameworks and controls needed to operate securely and efficiently in the cloud. In 2020, we exited our last legacy, on-prem data center, becoming the first U.S. financial institution to go all-in on the public cloud.
From the start, data security has been embedded into every layer of our infrastructure. But as our cloud environment matured and the volume of sensitive data grew, so did the complexity of balancing protection with usability. We needed a solution that could safeguard sensitive information without disrupting the systems and workflows that rely on it. One that protected data at rest, in motion and in use.
When we couldn’t find a market solution fast and flexible enough to support our needs, we built it. Our in-house tokenization engine was designed to meet the dual challenge of security and usability. Today, we run over a hundred billion tokenization operations each month across hundreds of applications.
Here are four key practices we’ve learned along the way:
1. Align tokenization with data governance policies
Strong data governance is the foundation for managing sensitive information responsibly. Rather than layer tokenization policies on top of existing protection frameworks, collaborate with leadership to integrate them directly into the governance model. Incorporating tokenization at the company policy level can help clarify which data types require protection, how tokenization should be applied and where enforcement is needed.
Since different business units often consume and produce data in varied ways, cross-functional collaboration is essential. Engaging data owners and security stakeholders early helps prioritize what needs to be protected and streamline implementation across systems.
2. Identify sensitive data types for prioritization
Clearly understanding what qualifies as sensitive data, and how it flows through systems, is critical for effective tokenization. In many cases, data that can directly identify an individual may be subject to stricter governance requirements.
Examples of data types that often fall into this category include:
-
Government-issued identifiers
-
Tax Identification Number (TIN)
-
Social Security Number (SSN)
-
Individual Taxpayer Identification Number (ITIN)
-
Adoption Taxpayer Identification Number (ATIN)
-
Preparer Tax Identification Number (PTIN)
-
-
Financial identifiers
-
Credit Card Primary Account Number (PAN)
-
Bank Account Number (BAN)
-
Once sensitive data types are identified, organizations can assess how best to handle them across existing pipelines in alignment with operational requirements.
3. Evaluate token usage within data pipelines
A key consideration for tokenization is how sensitive data is used within your organization. Tokenization strategies should account for how data is used in context, not just where it is stored. For example, generative AI models require large, representative datasets to produce accurate and reliable outputs. Tokenization allows these models to process realistic, non-sensitive data of the same “shape” and type as the underlying sensitive information, preserving both data utility and privacy.
Engaging with stakeholders across data and analytics teams can help identify where tokenization supports business goals and where it might introduce friction if not thoughtfully applied.
4. Minimize exposure to highly sensitive data
Reducing exposure is often one of the most effective ways to reduce risk. Organizations should first identify and remove non-essential sensitive data through redaction or deletion. Where complete removal isn't possible, replacing sensitive data with a token is recommended. Tokenization preserves the format and usability of the data while replacing it with a secure, non-sensitive value. This renders it useless to potential bad actors and allows data platforms and systems to operate normally.
Coordinating this effort with impacted teams helps ensure tokenization strategies are aligned with application requirements, governance standards and downstream system needs.
Tokenization is more than a security control; it’s a way to enable responsible data use across the full data lifecycle. When thoughtfully integrated into governance and architecture, tokenization provides a critical layer of security necessary for organizations to navigate an AI-first world. That’s exactly why Capital One Software brought Databolt to market.
Capital One Databolt is a powerful, patented tokenization solution that replaces high-risk data with secure, non-sensitive tokens, reducing exposure in the event of a breach. With Databolt, the underlying data format is preserved, enabling a range of use cases—from seamlessly running applications and managing third-party data sharing to adopting generative AI safely.
Databolt is a vaultless solution, offering businesses:
-
Lightning-fast performance with throughput up to 4 million tokens per second.
-
An advanced security model, where sensitive data does not leave the business’ environment.
-
Cloud-native architecture with a flexible deployment model that can easily fit within a business’ unique infrastructure without causing disruption.
Whether you’re just starting to explore tokenization or looking to scale an existing approach, Databolt offers a robust solution to embed protection where it matters most–without compromising business agility. Request a demo to see how Databolt can support your data security goals.
References:
- Capital One Software. (2025, 04 17). Capital One Software unveils Capital One Databolt. Capital One. https://www.capitalone.com/software/blog/capital-one-software-announces-capital-one-databolt/
- Forbes. (2019, 05 22). Rethinking The Role of Chief Data Officer. Forbes. https://www.forbes.com/sites/insights-intelai/2019/05/22/rethinking-the-role-of-chief-data-officer/
- Forbes. (2024, August 12). Capital One: The Ongoing Story of How One Firm Has Been Pioneering Data, Analytics, & AI Innovation For Over Three Decades. Forbes. https://www.forbes.com/sites/randybean/2024/08/11/capital-one-the-ongoing-story-of-how-one-firm-has-been-pioneering-data-analytics--ai-innovation-for-over-three-decades/
- Perkel, A., & Cabaj, M. (2024, May 23). Doing The Hard Things First—Lessons From Our Cloud Journey. Capital One. Retrieved August 11, 2025, from https://www.capitalone.com/software/blog/cloud-migration-journey/
- Richard Fairbank | Board Member | Capital One Financial Corp. (2025). Capital One Investor Relations. Retrieved August 8, 2025, from https://investor.capitalone.com/board-member/richard-fairbank