基于精简语言模型的检索增强推理
Retrieval-augmented reasoning with lean language models
August 15, 2025
作者: Ryan Sze-Yin Chan, Federico Nanni, Tomas Lazauskas, Rosie Wood, Penelope Yong, Lionel Tarassenko, Mark Girolami, James Geddes, Andrew Duncan
cs.AI
摘要
本技术报告详述了一种将推理与检索增强生成(RAG)相结合的新颖方法,该方法集成于单一精简的语言模型架构中。现有RAG系统通常依赖大规模模型及外部API,而我们的研究则响应了在资源受限或安全环境中部署高性能且保护隐私解决方案的日益增长需求。基于测试时扩展和小规模推理模型的最新进展,我们开发了一种检索增强型对话代理,能够利用轻量级骨干模型解析复杂的领域特定查询。该系统整合了密集检索器与经过微调的Qwen2.5-Instruct模型,采用合成查询生成及源自前沿模型(如DeepSeek-R1)的推理轨迹,针对精选语料库——此处为NHS A-to-Z疾病页面——进行训练。我们探讨了基于摘要的文档压缩、合成数据设计及推理感知微调对模型性能的影响。与非推理及通用精简模型的对比评估表明,我们的领域特定微调策略在答案准确性和一致性上取得了显著提升,接近前沿模型性能,同时保持本地部署的可行性。所有实现细节与代码均已公开发布,以支持跨领域的可复现性与适应性。
English
This technical report details a novel approach to combining reasoning and
retrieval augmented generation (RAG) within a single, lean language model
architecture. While existing RAG systems typically rely on large-scale models
and external APIs, our work addresses the increasing demand for performant and
privacy-preserving solutions deployable in resource-constrained or secure
environments. Building on recent developments in test-time scaling and
small-scale reasoning models, we develop a retrieval augmented conversational
agent capable of interpreting complex, domain-specific queries using a
lightweight backbone model. Our system integrates a dense retriever with
fine-tuned Qwen2.5-Instruct models, using synthetic query generation and
reasoning traces derived from frontier models (e.g., DeepSeek-R1) over a
curated corpus, in this case, the NHS A-to-Z condition pages. We explore the
impact of summarisation-based document compression, synthetic data design, and
reasoning-aware fine-tuning on model performance. Evaluation against both
non-reasoning and general-purpose lean models demonstrates that our
domain-specific fine-tuning approach yields substantial gains in answer
accuracy and consistency, approaching frontier-level performance while
remaining feasible for local deployment. All implementation details and code
are publicly released to support reproducibility and adaptation across domains.