基於精簡語言模型的檢索增強推理

摘要

本技術報告詳述了一種新穎的方法，將推理與檢索增強生成（RAG）結合於單一精簡的語言模型架構中。現有的RAG系統通常依賴於大規模模型和外部API，而我們的工作則針對在資源受限或安全環境中部署高效能且保護隱私的解決方案日益增長的需求。基於測試時縮放和小規模推理模型的最新發展，我們開發了一種檢索增強型對話代理，能夠使用輕量級骨幹模型解釋複雜的領域特定查詢。我們的系統整合了密集檢索器與微調的Qwen2.5-Instruct模型，利用合成查詢生成和從前沿模型（如DeepSeek-R1）推導出的推理軌跡，在一個精選的語料庫（本例為NHS A-to-Z條件頁面）上進行訓練。我們探討了基於摘要的文件壓縮、合成數據設計和推理感知微調對模型性能的影響。與非推理和通用精簡模型的評估結果顯示，我們的領域特定微調方法在答案準確性和一致性方面取得了顯著提升，接近前沿水平的性能，同時仍適合本地部署。所有實現細節和代碼均已公開，以支持跨領域的可重現性和適應性。

English

This technical report details a novel approach to combining reasoning and retrieval augmented generation (RAG) within a single, lean language model architecture. While existing RAG systems typically rely on large-scale models and external APIs, our work addresses the increasing demand for performant and privacy-preserving solutions deployable in resource-constrained or secure environments. Building on recent developments in test-time scaling and small-scale reasoning models, we develop a retrieval augmented conversational agent capable of interpreting complex, domain-specific queries using a lightweight backbone model. Our system integrates a dense retriever with fine-tuned Qwen2.5-Instruct models, using synthetic query generation and reasoning traces derived from frontier models (e.g., DeepSeek-R1) over a curated corpus, in this case, the NHS A-to-Z condition pages. We explore the impact of summarisation-based document compression, synthetic data design, and reasoning-aware fine-tuning on model performance. Evaluation against both non-reasoning and general-purpose lean models demonstrates that our domain-specific fine-tuning approach yields substantial gains in answer accuracy and consistency, approaching frontier-level performance while remaining feasible for local deployment. All implementation details and code are publicly released to support reproducibility and adaptation across domains.

基於精簡語言模型的檢索增強推理

Retrieval-augmented reasoning with lean language models

摘要

Support