Capital One NLP research at EACL 2026

See our latest advances in instruction compliance, fault attribution and RAG architecture at the leading European NLP conference.

Capital One technologists are excited to participate in the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026) taking place in Rabat, Morocco, on March 24-29, 2026. As a premier venue for computational linguistics, EACL provides a vital forum for sharing research that addresses the complexities of natural language. 

Our technical presence at EACL 2026—comprising five Main Conference papers and one Industry Track paper—is rooted in a clear research priority: building AI systems that remain stable and transparent when reasoning through complex, real-world logic. By focusing on instruction compliance, automated fault detection and Retrieval-Augmented Generation (RAG), this work provides the foundational technical solutions necessary for the next generation of financial services. By participating in this conference, Capital One continues to contribute to global advancements in AI safety, system reliability and the creation of explainable AI architectures.

Technical foundations: Advancing AI reliability and reasoning

The following research examines the limits of how large language models (LLMs) adhere to complex constraints and identifies why agentic systems can sometimes falter over long horizons. This section includes work from Capital One researchers and first-authored papers from the 2025 Data Science Internship Program (DSIP).

Deconstructing Instruction-Following: A New Benchmark for Granular Analysis of Large Language Model Instruction Compliance Abilities
Capital One Authors: Alberto Purpura, Li Wang, Sahil Badyal, Eugenio Beaufrand, Adam Faulkner

Reliably ensuring LLMs follow complex instructions is a significant challenge. Part of the problem is how we measure it, since benchmarks often conflate instruction compliance with general task success. This paper introduces a novel evaluation framework and a dynamically generated dataset with thousands of prompts, each containing up to 20 application-oriented constraints to enable granular analysis of the instruction-following abilities of LLMs. The research demonstrates that compliance varies significantly by constraint type and position, revealing specific model weaknesses like primacy and recency effects. These insights assist in developing LLMs for rigorous environments that require strict adherence to complex instructions.

 

RAFFLES: Reasoning-based Attribution of Faults for LLM Systems
Capital One Authors: Chenyang Zhu, Jingyu Wu, Kushal Chawla, Youbing Yin, Nathan Wolfe, Erin Babinsky, Daben Liu

Identifying the exact point of failure in multicomponent LLM agentic systems can be a persistent challenge. This paper presents RAFFLES, an evaluation architecture that uses iterative reasoning to automate fault detection. By utilizing a central “judge” to investigate faults and specialized “evaluators” to verify that reasoning, RAFFLES identifies failure points with significantly higher accuracy than existing baselines. This work offers a path toward enhancing human review with automated diagnostics for autonomous systems.

 

DF-RAG: Enhancing RAG for Question Answering by Balancing Relevance and Diversity of Retrieved Chunks
Capital One Authors: Saadat Hasan Khan (DSIP 2025), Jingyu Wu, Youbing Yin, Erin Babinsky, Daben Liu

Retrieval-augmented generation (RAG) performance is often limited by redundant information during the retrieval phase. This work, led by a 2025 Data Science Intern, introduces DF-RAG, a pipeline that dynamically balances retrieval diversity for each query at test time. Using a maximal marginal relevance (MMR)-based scoring mechanism and LLM-driven planning, DF-RAG recovered up to 90% of the gap between standard retrieval and the theoretical upper bounds for information recall in multi-hop benchmarks.

 

ART: Adaptive Reasoning Trees for Explainable Claim Verification
Capital One Authors: Sahil Wadhwa, Himanshu Kumar, Guanqun Yang (DSIP 2025), Abbaas Alif Mohamed Nishar, Pranab Mohanty, Swapnil Shinde, Yue Wu

The opacity of LLM decision-making remains a barrier to adoption in select fields. This paper proposes ART (adaptive reasoning trees), a hierarchical method for claim verification. The process branches root claims into supporting and attacking arguments, which are adjudicated via a pairwise tournament by a judge LLM to derive a transparent, contestable verdict. ART’s structured reasoning consistently outperforms chain-of-thought (CoT) methods in explainable AI tasks.

Scalable personalization: Profile-grounded synthetic data

Bridging the gap between fundamental research and industry application is a priority for Capital One’s academic partnerships. This work was developed at the USC-Capital One Center for Responsible AI and Decision Making in Finance (CREDIF), a joint research center that advances state-of-the-art research and foundations for algorithmic, data and software innovations in responsible AI and its applications to finance.

GRAVITY: A Framework for Personalized Text Generation via Profile-Grounded Synthetic Preferences
Capital One Authors: Wenqing Zheng and Daniel Barcklow

Personalization in LLMs typically requires manual human feedback. This paper introduces GRAVITY, a framework for generating synthetic, profile-grounded preference data. By integrating cultural and psychological frameworks (such as the Big Five OCEAN traits), GRAVITY synthesizes preference pairs to guide personalized content generation. In evaluations with 400 users, GRAVITY achieved higher preference gains than standard fine-tuning, providing a scalable path for personalization without relying on manual annotation.

Applied engineering: Adaptable life cycles for generative AI

Our participation in the Industry Track focuses on the practicalities of developing and maintaining AI systems at scale within complex operational environments.

Lessons from the Field: An Adaptable Lifecycle Approach to Applied Dialogue Summarization
Capital One Authors: Kushal Chawla, Alfy Samuel, Shixiong Zhang, Ayushman Singh, Chenyang Zhu, Erin Babinsky, Jonah Lewis, Keasha Safewright, Pengshan Cai, Sambit Sahu, Sangwoo Cho, Scott Novotney

Summarizing multiparty dialogues is a critical industrial capability, yet static benchmarks rarely reflect the evolving requirements of real-world use. This industry case study shares insights from developing an agentic summarization system, covering evaluation methods for subjective tasks, component-wise optimization and the impact of upstream data bottlenecks. The work provides a road map for building summarization systems that remain adaptable to evolving business needs and technical constraints.

Learn more about Capital One’s research at EACL 2026

Researchers and authors are available throughout the conference to discuss these advancements and their application to the financial services landscape. To learn more about AI and machine learning at Capital One, visit our careers page.

We look forward to a fantastic EACL 2026 and an engaging conference in Rabat!


Capital One Tech

Stories and ideas on development from the people who build it at Capital One.

Related Content

Smiling man attends conference
Article | November 20, 2025 |13 min read
Illustration of speaker presenting at conference while audience members take notes and photos
Article | November 3, 2025 |6 min read
People at an academic conference
Article | August 5, 2025 |4 min read