DF-RAG: Enhancing RAG for question answering
A pipeline that dynamically adapts the level of diversity for each query at test time without requiring prior information. (EACL)
Retrieval-augmented generation (RAG) enables large language models (LLMs) to incorporate external knowledge for knowledge-intensive tasks, producing more factually accurate outputs. RAG's performance therefore is contingent on the retrieval of the right information. However, most existing RAG methods rely on cosine similarity for retrieval, which often introduces redundancy that limits information recall, thereby reducing overall performance on downstream tasks. In this work, we first show that a diversity-focused retrieval at the dataset level improves RAG performance across multiple challenging long context question-answering including multi-hop benchmarks. We then design an oracle that estimates the hypothetical upper bound achievable if diversity were optimized at the query level, revealing potential gains of up to 18% in F1 scores. Motivated by this gap, we propose DF-RAG, a pipeline that dynamically adapts the level of diversity for each query at test time without requiring prior information. DF-RAG leverages a maximal marginal relevance (MMR)–based scoring mechanism combined with LLM-driven planning and execution, and can be used as a drop-in replacement for cosine similarity retrieval in modern RAG systems. Comprehensive experiments on five QA benchmarks show that DF-RAG consistently outperforms strong baselines, achieving 4-12% improvements in F1 over vanilla RAG and recovering up to 90% of the gap between vanilla RAG and the theoretical oracle upper bound.