Capital One’s contributions to NLP research at ACL 2025
Explore Capital One’s accepted papers and research impact at the premier conference for NLP and computational linguistics.
Capital One technologists are excited to participate in the 63rd Annual Meeting of the Association for Computational Linguistics, or ACL 2025, taking place July 27 through August 1 in Vienna, Austria. As a premier global NLP conference, ACL brings together leaders in computational linguistics research from academia and industry.
For years, Capital One has been building a robust foundation in technology, data and machine learning, positioning us to be at the forefront of enterprises leveraging AI. Our modern tech stack coupled with our talent is helping us to tackle some of the most pressing challenges in financial services. Our presence at ACL allows us to share research stemming from these capabilities and contribute to the broader scientific community.
At ACL 2025 we’re proud to share four accepted papers spanning research on language model scaling laws, multilingual NLP challenges and inclusive dataset creation.
Capital One’s ACL 2025 oral presentation: scaling laws and zero-sum learning
A significant highlight of Capital One's presence at ACL 2025 is the selection of our paper, Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning, for an oral presentation. This work was first authored by Andrei Mircea, a participant in our 2024 Applied Research Internship Program (ARIP), with co-authorship from Capital One applied researchers Supriyo Chakraborty and Nima Chitsazan. This contribution highlights the types of challenging AI problems explored through research conducted within our Ph.D. internship programs.
This paper aims to explain the mechanisms by which scaling strategies enhance large language models (LLMs), focusing specifically on their training dynamics. While other researchers focused on taking intrinsic model capacity, data distribution properties or asymptotic behaviour, we find an in-depth explanation rooted in training dynamics.
We find that language models undergo loss deceleration early in training—an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) delaying the onset of deceleration and (2) improving the log-log rate of loss reduction after deceleration.
This paper's findings attributes the loss deceleration to a form of degenerate training dynamics termed zero-sum learning (ZSL). In the ZSL state, the gradient from various samples becomes systematically opposed, resulting in lowering the cumulative learning in the batch. This means that efforts to reduce loss on one subset of training examples inadvertently degrade performance on another, creating a bottleneck that severely impedes overall learning progress.
The insights derived from understanding loss deceleration and ZSL offer new perspectives on the fundamental training dynamics that underpin language model scaling laws, suggesting potential avenues for developing targeted interventions to improve language models beyond simply increasing their size.
Exploring multilingual NLP challenges through collaborative research at ACL 2025
At Capital One, we have a strong belief in long-term, multi-sector partnerships to advance the frontier of AI research and address large-scale challenges. Our engagement with the broader AI research community, through collaborations with leading universities and other institutions, is a key part of how we foster innovation. This year, Capital One applied researcher Genta Winata has three papers accepted at ACL 2025, demonstrating the breadth of our collaborative efforts: two will be presented at the main conference and one at the FieldMatters workshop.
SEA-VL: Creating a multicultural vision-language dataset for Southeast Asia.
Southeast Asia, a region of immense linguistic and cultural diversity, remains significantly underrepresented in vision-language (VL) research, often resulting in AI models that miss crucial cultural nuances. The paper, Crowdsource, crawl, or generate? Creating SEA-VL, a multicultural vision-language dataset for Southeast Asia, introduces SEA-VL, an open-source initiative dedicated to developing high-quality, culturally relevant data for Southeast Asian languages. The initiative explores various data collection methods, including involving local contributors to ensure cultural relevance and diversity. While image crawling proves efficient with approximately 85% cultural relevance, generative AI struggles to accurately reflect nuanced SEA cultures. Collectively, SEA-VL gathers 1.28 million culturally-relevant images, aiming to address the representation gap and foster more inclusive AI systems.
Do language models understand Javanese honorifics?
The Javanese language contains a highly intricate honorific system that reflects social hierarchy - an aspect often poorly represented in existing Natural Language Processing (NLP) resources. Recently, LLMs have shown a tendency to favor one honorific level over others, largely due to limitations and imbalances in the training data. The paper, Do language models understand honorific systems in Javanese?, introduces Unggah-Ungguh, a dataset designed to encapsulate the nuances of Javanese speech etiquette. Using this dataset, the research assesses the ability of LLMs to process various levels of Javanese honorifics through classification and machine translation tasks. Experiments including cross-lingual machine translation with Indonesian and conversational generation tasks indicate that current LLMs struggle with most honorific levels and exhibit a bias toward certain honorific tiers, highlighting the challenges in processing such culturally-specific linguistic phenomena.
What causes knowledge loss in multilingual language models?
Cross-lingual transfer is crucial for enhancing multilingual NLP performance, but traditional training methods can lead to knowledge forgetting. The study, What Causes Knowledge Loss in Multilingual Language Models?, investigates knowledge loss in multilingual contexts, specifically focusing on how linguistic differences impact representational learning. Experiments across 52 languages using Low-Rank adapters (LoRA) evaluate parameter-sharing strategies to understand if they can mitigate forgetting and preserve prior knowledge. Findings indicate that languages utilizing non-Latin scripts are more susceptible to catastrophic forgetting, while those written in Latin script facilitate more effective cross-lingual transfer, offering insights for optimizing multilingual model training.
Learn more about Capital One’s research at ACL 2025!
We encourage you to attend the oral presentation and the workshops featuring Capital One's research at ACL 2025. This is an excellent opportunity to engage with the authors and gain deeper insights into their work. To explore career opportunities in AI and machine learning at Capital One, please visit our careers page.
We look forward to a fantastic ACL 2025 and an engaging conference in Vienna!