Data-Centric AI for Customer-Focused Product Development

How a focus on data quality leads to better ML-powered products

Think of a time when you felt surprised and delighted by a personalized digital interaction with one of your favorite brands. When your music app served up a suggested playlist that was right on point. When your streaming service recommended a new movie that deeply moved you. Or a time when you were searching for a unique gift for that friend who is notoriously impossible to shop for, and the perfect item appeared in a carousel of recommended products.

The magic behind these experiences is most often attributed to artificial intelligence (AI) and machine learning (ML). But what can get less attention is the foundational element of what makes AI & ML work—data. The data that feeds recommender systems or computer vision models, like convolutional neural networks and language models, is based on your engagement and interactions.

One core element of delivering intelligent products and solutions — and creating emotional connections with customers by solving their deepest needs — is ensuring that we are using high quality input data around how they interact with our products. Higher quality input data is achieved through numerous activities within the data and machine learning life cycles, such as:

  1. Making data cleansing, quality metadata, and data standardization systematic via automation 
  2. Ensuring customer data is safe and private at scale
  3. Augmenting and enriching data via programmatic labeling, active learning, and synthetic data 
  4. Systematically identifying bias in the data through model explainability and error analysis tools 
  5. Monitoring data and concept drift, creating feedback loops, and retraining models 

Models using high quality data will be able to better understand where customers experience friction and where their expectations shift. This will continually improve customer interactions with the product. 

Data is not static. Much like every person and the broader world around us, data changes constantly. Let’s take a closer look at some key challenges and considerations involved in building and deploying ML-driven products in real-world environments — environments where the data we rely on is based on continually evolving behaviors and external circumstances. 

When data facilitates a virtuous cycle

At Capital One, AI, ML, and data are central to how we build our products and services for customers and to how we run our company. And they’re at the core of our focus to better understand customer needs and deliver truly personalized experiences. 

For example, when customers log on to our website or mobile app, our conversational AI capabilities can help them find the information they may want. As the customer starts interacting with the website or app, the ML models behind our conversational AI engine start to learn, in real-time, what the customer may be seeking. 

Say a customer immediately navigates to their checking account, then switches over to CreditWise and spends a few minutes there. This might indicate they’re looking for insights relating to their spending habits and their credit score. This information helps fine-tune the recommendations our ML-powered mobile app will be prepared to offer should the customer have questions or want more information in the future. As the user engages, their interactions become data points that are fed into the models, creating a continuous loop that helps improve its ability to predict further customer intent and solve forthcoming needs.

It’s important to note here that while this level of insight can be incredibly useful to deliver personalized experiences, it’s critical to stay highly attuned to the balance of delivering customers real value while maintaining high standards for user privacy, transparency, security and control. 

This all works really well when user behaviors stick to easily identifiable patterns with consistent variables. But what happens when customer intent changes unexpectedly? And what happens in novel external environments—like a global pandemic—that create outputs running counter to the model’s expectations?

When data shifts and demands resilience

Many ML models are built on historical data, which isn’t necessarily a sufficient representation of our current environment. Relying too heavily on these types of models can generate counterintuitive estimates. This is broadly defined as model drift.

How is this problem solved? In a few words: real-time adaptation. It’s about the ability to quickly react to changes, taking feedback in real-time and sending it back into the model so it can re-learn the current intent of the customer and their expectations, and adjust its outputs accordingly. 

But to be adaptive and resilient requires sound frameworks, tools, data patterns, and governance practices. A few examples:

  • Creating feedback and engagement mechanisms for users can deliver valuable and high quality data to help train machine learnings and deliver better experiences for the user in the future (e.g. a thumbs up or thumbs down feature) 
  • Standardizing tools, processes, and platforms can help data scientists and engineers more easily identify, access data, and build on ML model deployment foundations
  • Real-time, structured, data that is standardized, available and applied to high-impact customer use cases
  • Automated model monitoring and training processes, ensuring consistent performance and management of continuous integration and delivery
  • Focusing on enterprise-wide, high quality data by developing tools and solutions for both data producers and consumers
  • Building foundation model architecture and frameworks, enabling teams to update and retrain models to quickly respond to the appropriate problem, or use case, that the customer expects to be solved
  • Establishing well managed, human-centered processes like model governance, risk controls, peer review, and bias mitigation are just a few examples of how responsible AI & ML practices can and should ensure that humans remain at the center of all decision-making

When data should serve the needs of its human generators (hint: always)

Ultimately, a machine learning model is just a model. It will only be effective in production if it’s running on the right data, in the right environment, and being applied to the right use case. And that requires a constant focus on serving the needs, expectations, and goals of the human user through continuous engagement, feedback, and adjustment.

AI and ML will continue to remain at the forefront of what’s possible in reimaging customer experiences. As our technological capabilities become more advanced and as the world becomes more complex, we need to keep a focus on building and deploying AI and ML in a responsible, well-managed way that puts people first.


Nurtekin Savas, VP, Head of Machine Learning and Data Science, Enterprise Platforms & Products

Nurtekin is the VP and Head of Machine Learning for the Enterprise Products and Platforms organization at Capital One. His main focus is using machine learning to increase data quality through intelligent automation and data labeling, delivering personalized and real time customer experiences, reducing friction from customer journeys and curbing fraud. Prior to Capital One he was the Head of AI for Fidelity Investments’ Personal Investing organization and led the DS and ML teams in Amazon Payment Products. Nurtekin has experience leading many areas in machine learning including personalization, recommender systems, targeting models, NLP, computer vision, content generation, data labeling, active learning, conversational AI, econometrics and model governance/Ethics in ML. Nurtekin is an engineer with graduate degrees in Finance, Business and Data Science. Nurtekin is passionate about education and serves as an advisor to the Boston College Applied Economics program. Nurtekin lives in the Greater Boston area with his wife and two kids.

Related Content