Three tips for harnessing Snowflake’s data cloud
Learn how Capital One optimized Snowflake for real-time data analysis at scale with three best practices.
May 25, 2022
Since its founding, data has been at the heart of Capital One. We believe in the power of data to drive insights and empower people to deliver real-time solutions to our millions of customers. Of course, the amount of data we analyze has skyrocketed over the last thirty years, making it more difficult to share data across the company and derive insights in real time. That’s where Snowflake comes in with its cloud data platform.
Snowflake separated data storage from compute for relational data warehouses — and for customers like Capital One, that means our hardware no longer limits us. Instead of racking up technical debt, we can focus on our data and what we do best: build personalized customer experiences that transform people’s relationship with their money.
Our unique journey with Snowflake
Capital One is the first U.S. bank to exit our on-premise data centers and go all in on the cloud, and we’ve written a great deal about our cloud journey and our learnings. We exited our data centers because we worked hard not to be burdened by legacy technologies, technical debt, and silos.
As we worked to modernize our data operations in the cloud, we adopted Snowflake to enable our more than 6,000 analysts to run millions of queries with no degradation in performance. We needed performance that could scale infinitely and instantly for any workload, and would allow multiple lines of business to seamlessly share data with proper fine grained access control.
With Snowflake, multiple analysts can access the same data without affecting each other’s performance. In concrete terms, Snowflake allows our credit card team to make intensive queries without affecting the performance of other teams who are making queries on that same data. At the same time, we can have ETL jobs running different compute tasks on the same data without impacting anyone else.
Snowflake is so flexible and efficient that you can quickly go from “data starved, to “data drunk.” To avoid that data avalanche and associated costs, we worked to put some controls in place before our users migrated to Snowflake. For example, users cannot select a larger cluster than their workload requires or run workloads in a manner that never allows Snowflake compute/warehouse to suspend.
Also, as a technology company in financial services, we operate in a regulated environment. Our model is unique, but our journey with Snowflake applies to any company that operates within a regulated industry. In many ways, our journey with Snowflake applies to any company that must get value from its data.
To generate the most value, organizations need to integrate tools like Snowflake thoughtfully and, at times, creatively. We figured out how to take advantage of Snowflake’s speed and flexibility — while providing the kind of traceability a heavily regulated company like ours requires. Also, being a bank, we understand a thing or two about budgets. So we devised a way to ensure that usage levels were reasonable and on budget.
Best practices we’ve found for using Snowflake
1. Create ways to streamline onboarding and develop processes and solutions.
To provision and manage compute or storage resources, Capital One created an online self-service portal that equips teams with the resources they need. But our tools also fit into existing processes and organizational structures to control costs and assure best practices are followed.
2. Ensure you track and optimize resources to control cost.
With Snowflake, your company unlocks access to data — the data flow is the difference between a garden hose and a fire hose. It’s important to manage and track usage, as costs can rise due to faulty configurations or inefficient queries. While it’s possible to centralize Snowflake access and provisioning through a department head, that method can reintroduce the bottlenecks you were trying to get rid of when you opted for Snowflake in the first place.
Capital One developed a dashboard interface that puts performance and cost management into the hands of key decision-makers — without slowing down the overall process. It generates alerts when there is a sudden increase in cost. It also automatically recommends a way to remediate. In short, you find out right away if something should go wrong.
3. Govern securely and transparently.
As data becomes pervasive, ensuring it’s being managed responsibly grows increasingly critical. As a heavily regulated company, Capital One has built a traceability solution into their Snowflake system that enables approval workflows and data logging to support data remediation and retention use cases.
At Capital One, we’re believers in Snowflake because it enables us to harness data and put it to work. But as with any technology, corporations must take a 360-degree look at what’s required when you integrate any solution. Technology on its own is a resource, but as our use case demonstrates: We must also think creatively.
Learn more about how we’re leveraging data at scale with Snowflake at www.CapitalOne.com/Snowflake-Summit.