NeurIPS 2025: Capital One showcases leading AI research

Discover our advancements in AI efficiency, safety and scale at the leading global AI research conference.

Capital One is returning as a Platinum Sponsor for The 39th Annual Conference on Neural Information Processing Systems (NeurIPS), taking place in San Diego from Dec. 2–7, 2025. Our Applied Research team is helping pave the way to the frontier of AI in financial services. We’re advancing breakthrough research advances that are adopted into the business to create real-world impact for over 100+million customers. Our team applies a deep technical innovation approach to foundational problems that range from responsible AI, multi-agent systems and reasoning, to state-of-the-art language modeling and beyond. Building directly on our strong showing at NeurIPS 2024, we are excited to feature over 20 accepted papers and numerous on-site expert sessions that showcase our ongoing role in driving fundamental advancements in the field.

Featured expo engagements: Connecting research to practice

Capital One is leading several high-impact sessions at the NeurIPS Expo, providing attendees with an exciting opportunity to engage with us and learn about the research challenges we’re pursuing.

Expo Workshop - Exploring Trust and Reliability in LLM Evaluation

The current paradigm of Large Language Model (LLM) evaluation faces significant challenges in reliability due to issues like benchmark contamination, prompt overfitting and metrics that often fail to reflect real-world use. This workshop aims to reassert rigor in LLM evaluation by confronting common challenges that stem from over-reliance on standard benchmarks. The session seeks to chart a concrete path toward developing consistent evaluation methods, managing widespread contamination and defining robust data curation practices for trustworthy and utility-aligned LLM frameworks.

  • Date/Time: Tuesday, December 2nd,  12:00 PM - 1:30 PM PST (Upper Level Room 30A-E)

 

Expo Talk - GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection

Presented by Melissa Kazemi Rad, this session details GRAID (Geometric and Reflective AI-Driven Data Augmentation), a novel pipeline designed to overcome data scarcity in harmful text classification for guardrailing applications. GRAID leverages LLMs in a two-stage process—geometric constraint and multi-agentic reflection—to promote stylistic diversity and uncover difficult edge cases in harmful content. The talk will demonstrate how GRAID significantly improves downstream guardrail model performance, enhancing AI safety.

  • Date/Time: Tuesday, December ​​​​​2nd,  8:30 AM - 9:30 AM PST (Upper Level Room 30A-E)

Main conference papers: Capital One-led research

Our researchers are dedicated to solving critical, industry-scale challenges in building safe and efficient language systems. The following two main conference papers were led by Capital One researchers. 

T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning

Capital One Authors: Amartya Chakraborty, Paresh Dashore, Nadia Bathaee, Anmol Jain, Anirban Das, Shi-Xiong Zhang, Sambit Sahu, Milind Naphade, Genta Indra Winata

Effective planning in multi-turn conversations, especially when involving complex API or tool dependencies, remains a significant challenge for LLM agents. This paper introduces T1, a tool-augmented, multi-domain, multi-turn conversational dataset specifically designed to rigorously evaluate agents' ability to coordinate tool use. T1 features an integrated caching mechanism and supports dynamic replanning, serving as a powerful benchmark for evaluating the performance of open-source language models in complex, tool-dependent scenarios.

  • Spot Talk: Wednesday, December 3rd 1:45 PM - 2:00 PM PST (Exhibit Hall A,B)

  • Poster Presentation: Thursday, December 4th 4:30 PM - 7:30 PM PST (Exhibit Hall C,D,E)

 

Dense Backpropagation Improves Training for Sparse Mixture-of-Experts

Capital One Authors: Sambit Sahu and Supriyo Chakraborty

Mixture-of-Experts (MoE) models offer scalability but suffer from sparse backward updates, leading to training instability. This paper introduces a lightweight approximation method, Default MoE, that gives the MoE router a dense gradient update while continuing to sparsely activate its parameters. By receiving signals from every expert for each token, the router achieves significant improvements in training performance without requiring substantial computational overhead.

Main conference papers: Advancing the frontier through collaboration

Capital One attracts leading technologists whose foundational work, conducted while they were students, continues to drive the field forward. These four papers, accepted to the Main Conference, highlight impactful research at the intersection of AI efficiency, safety and algorithmic fairness.

GPO: Learning from Critical Steps to Improve LLM Reasoning

This paper introduces Guided Pivotal Optimization (GPO), a novel fine-tuning strategy that enhances multi-step LLM reasoning by identifying the "critical step"—a pivotal moment where the model must proceed carefully—within a reasoning trajectory. Co-authored by current Capital One researcher Zelei Cheng during his time at Northwestern University, the method leverages the advantage function to locate this step, resets the policy to that point and prioritizes learning on new rollouts, consistently and significantly improving reasoning performance.

 

EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment

This research addresses the challenge of erasing harmful or proprietary concepts from text-to-image generators while preserving image quality. The paper introduces EraseFlow, the first framework that casts concept unlearning as exploration in the space of denoising paths and optimizes it with a GFlowNet. First authored by current Capital One researcher Naga Sai Abhiram Kusumba during his time at Arizona State University, the method learns a stochastic policy that steers generation away from target concepts while preserving the model’s prior, demonstrating an optimal trade-off between concept elimination and performance preservation.

 

On Hierarchies of Fairness Notions in Cake Cutting: From Proportionality to Super Envy-Freeness

This work considers the classic cake-cutting problem in the Robertson-Webb query model, focusing on computational complexity and fairness. First authored by current Capital One Software Engineer Arnav Mehra during his time at Purdue University, the research introduces two hierarchies of new fairness notions—Harmonic Coalition-Resistance (HCR) and Linear Coalition-Resistance (LCR)—to explore the middle ground between simple notions like proportionality and complex notions like envy-freeness and super envy-freeness. The paper provides theoretical bounds for computing these new allocation types, advancing research in algorithmic economics and game theory.

 

AI Progress Should Be Measured by Capability-Per-Resource, Not Scale Alone: A Framework for Gradient-Guided Resource Allocation in LLMs

This position paper challenges the dominance of "scaling fundamentalism," arguing for a fundamental reorientation toward capability-per-resource rather than capability alone. The work presents a theoretical framework where resource allocation is guided by gradient influence patterns. Co-authored by current Capital One researcher Yulun Wu during his time at UC Berkeley, the analysis demonstrates that updating only high-influence parameters strictly outperforms full-parameter tuning on a performance-per-resource basis, offering a path to dramatically improve efficiency and democratize access to cutting-edge AI capabilities.

 

Additionally, our 2025 Visiting Scholar, Dr. Furong Huang, an Associate Professor of Computer Science at the University of Maryland, contributed to four other NeurIPS papers accepted into the Main Conference track. Dr. Huang's expertise in trustworthy machine learning enhances our research collaborations and showcases the real-world impact of her work, effectively bridging the gap between academic theory and practical industry applications.

Fostering the next generation of AI talent: Workshop papers

Capital One is deeply committed to nurturing the next generation of AI/ML leaders through our Applied Research Internship Program (ARIP) and Data Science Internship Program (DSIP). The following papers will be presented at various NeurIPS workshops, demonstrating the quality of research achieved during their internship.

Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question Answering

This work focuses on quantifying epistemic uncertainty in the contextual Question Answering (QA) task. It proposes a theoretically grounded approach that interprets epistemic uncertainty as semantic feature gaps in a model's hidden representations relative to an idealized model. This method substantially outperforms state-of-the-art unsupervised and supervised Uncertainty Quantification (UQ) methods, achieving up to a 13-point PRR improvement while incurring negligible inference overhead.

 

Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs

This paper proposes an alignment-first approach to transfer advanced reasoning skills between LLMs, mitigating the negative interference common in task arithmetic. The method exploits the inherent permutation, rotation and scaling symmetries of Transformer architectures by matching layer-wise activations rather than weights, reducing the need for redundant fine-tuning efforts across evolving LLM families.

 

Improving Consistency in Retrieval-Augmented Systems with Group Similarity Reward

This paper presents a reinforcement learning approach to improve output consistency in Retrieval-Augmented Generation (RAG) systems. The proposed Paraphrased Set Group Relative Policy Optimization (PS-GRPO) method leverages multiple rollouts across paraphrased sets to assign group similarity rewards. Empirical results demonstrate that PS-GRPO significantly improves RAG output consistency without compromising factual accuracy, offering a scalable solution to a critical reliability challenge.

 

Bridging the Divide: End-to-End Sequence–Graph Learning

This paper introduces BRIDGE, a unified end-to-end architecture that jointly learns sequential and relational data. BRIDGE couples a sequence encoder with a Graph Neural Network (GNN) under a single objective, enabling fine-grained token-level message passing among neighbors via a TokenXAttn layer. This approach consistently outperforms static GNNs, sequence-only baselines and existing temporal graph approaches in domains like friendship prediction and fraud detection.

 

Optimizing Reasoning Efficiency through Prompt Difficult Prediction

  • Capital One Authors: Bo Zhao (ARIP 2025), Berkcan Kapusuzoglu, Genta Indra Winata, Kartik Balasubramaniam, Sambit Sahu, Supriyo Chakraborty

  • Accepted to the First Workshop on Efficient Reasoning

This paper proposes a dynamic routing approach for reasoning language models to improve deployment efficiency. The method trains lightweight predictors of problem difficulty or model correctness to assign each problem to the smallest model likely to solve it. This difficulty-aware routing matches the performance of the largest models while using significantly less compute, enabling cost-efficient deployment.

 

TimeSqueeze: Dynamic Patching for Efficient Long-Context Time Series Forecasting 

Transformer-based time series foundation models face a trade-off between preserving high-frequency information via expensive point-wise embeddings and reducing sequence length via lossy patch-based embeddings. This paper introduces TimeSqueeze, a hybrid architecture that combines the strengths of both point and patch embeddings through dynamic time-series compression. A lightweight state-space encoder first extracts fine-grained temporal features from the full-resolution series, which an adaptive patching module then prunes using variable-sized patches based on information density. This variable-resolution approach combines point-embedding fidelity with patching efficiency, significantly reducing the input sequence length for the Transformer backbone. Experiments confirm TimeSqueeze achieves state-of-the-art forecasting with substantial computational advantages.

 

Continual Pre-training of MoEs: How robust is your router?

This large-scale empirical study investigates the robustness of MoE routers during Continual Pre-training (CPT). The results establish a surprising robustness to distribution shifts for MoEs, showing that they can maintain their sample efficiency and match the performance of a fully re-trained MoE at a fraction of the cost, even when continually pre-trained without using replay.

 

Spatio-Temporal Directed Graph Learning for Account Takeover Fraud Detection

This paper addresses Account Takeover (ATO) fraud by reformulating the problem as a node classification task on a large, dynamic graph. The research applies a Graph Neural Network (GNN), specifically GraphSAGE, to leverage spatial and temporal relationships among online user sessions, significantly outperforming existing production tabular models on key performance metrics and offering a scalable solution for real-time fraud detection.

Additional research at workshops: Extending our reach

These papers, featuring Capital One authors, will also be presented across various NeurIPS workshops, contributing to fields ranging from hardware optimization and scientific discovery to advanced LLM safety and governance techniques.

RAFFLES: Reasoning-based Attribution of Faults for LLM Systems: Proposes an iterative reasoning evaluation framework RAFFLES for fine-grained fault attribution in agentic LLM pipelines, identifying the "who" (agent) and "when" (step) of an agentic system’s failure and exceeding the previously reported best by 27 percentage points. (Accepted to the Workshop on Multi-Turn Interactions in Large Language Models)

  • Capital One Authors: Chenyang Zhu, Spencer Hong, Jingyu Wu, Kushal Chawla, Charlotte Tang, Youbing Yin, Nathan Wolfe, Erin Babinsky, Daben Liu

 

Towards Scalable Meta-Learning of near-optimal Interpretable Models via Synthetic Model Generations: Introduces an efficient method for generating synthetic pre-training data to enable meta-learning of interpretable decision tree models, achieving strong performance with reduced computational cost. (Accepted to Generative AI in Finance Workshop)

  • Capital One Authors: Alexandre Day, Zhe Wu, Kyaw Hyponemyint

 

BEDTime: A Unified Benchmark for Automatically Describing Time Series: Formalizes and evaluates three tasks—recognition, differentiation and generation—that test a model's ability to describe time series using generic natural language, providing a standardized evaluation for time series reasoning systems. (Accepted to Learning from Time-Series for Health Workshop)

  • Capital One Authors: Bayan Bruss and Nam Nguyen

 

R3: Robust Rubric-Agnostic Reward Models: Introduces R3, a novel reward modeling framework that is rubric-agnostic and provides interpretable, reasoned score assignments to support transparent and flexible alignment of LLMs. (Accepted to the LLM Evaluation Workshop)

  • Capital One Authors: Genta Indra Winata

 

EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments: Presents EconWebArena, a benchmark with 360 curated economic tasks challenging autonomous agents to navigate live web environments, interpret content and extract precise, time-sensitive data through multi-step workflows. (Accepted to the LAW and Generative AI in Finance Workshops)

  • Capital One Authors: Zefang Liu

 

DecAEvolve: Decompose, Adapt, and Evolve, or, Three Pillars of Effective LLM-Based Scientific Equation Discovery: Introduces DecAEvolve, a framework that enhances the robustness and efficiency of evolutionary scientific equation discovery with LLMs by unifying symbolic decomposition with test-time reinforcement learning (RL) adaptation. (Accepted to AI4Science and MATH-AI Workshops)

  • Capital One Authors: Kazem Meidani

 

A Joint Learning Approach to Hardware Caching and Prefetching: Argues for training cache replacement and prefetching policies jointly rather than in isolation, proposing two alternative solutions through shared representations using a joint encoder, or via contrastive learning which significantly improve over uncoordinated caching and prefetching baselines. (Accepted to the Workshop on Machine Learning for Systems)

  • Capital One Authors: Nihal Sharma

 

Systemic Risk and Bank Networks: The Use of a Knowledge Graph with Generative Artificial Intelligence: Studies systemic risk and networks of top financial institutions during the 2008 Global Financial Crisis by drawing knowledge graphs from textual data (i.e., news) and numeral data using partial correlation matrix, NLP, embedding methods and generative AI. (Accepted to Generative AI in Finance Workshop)

  • Capital One Authors: Xiaohu Zhang

 

Influence Functions for Efficient Data Selection in Reasoning: Focuses on leveraging influence functions—a technique for measuring the impact of individual training data points—to efficiently identify the most valuable data samples for improving the performance of reasoning models. (Accepted to the Foundations of Reasoning in Language Models Workshop)

  • Capital One Authors: Supriyo Chakraborty

Connect with Capital One at NeurIPS 2025

We're excited to connect with you at NeurIPS 2025! Come visit us at booth #1219 where you can:

  • Explore our research: Dive deep into our latest advancements in AI and machine learning.

  • Discover career opportunities: Learn about exciting applied research career paths at Capital One for researchers and engineers passionate about AI and join our world-class team.

  • Engage with our team: Meet our researchers and AI experts, ask questions and discuss the future of AI in finance.

We look forward to an engaging and insightful NeurIPS 2025!


Capital One Tech

Stories and ideas on development from the people who build it at Capital One.