MCompassRAG：主题元数据作为段落级检索的语义罗盘

摘要

检索增强生成（RAG）系统的性能关键取决于文档的分块与检索方式。细粒度分块能提升检索精度，但会扩大搜索空间，增加延迟与成本；粗粒度分块虽可减少候选项数量，却因每个分块的向量表示混合了多种主题并引入更多语义噪声，导致稠密相似度计算可靠性下降。这种权衡在深度研究任务中尤为突出——这类任务需在规模庞大且异构的语料库中同时实现快速与精确的检索。为此，我们提出MCompassRAG，一种元数据引导的检索框架，将主题级信号作为语义指南针来筛选相关证据。不同于仅依赖查询与含噪分块向量间的余弦相似度，MCompassRAG在统一嵌入空间中用主题元数据增强分块表示，并通过大语言模型教师蒸馏训练轻量级检索器。在推理阶段，MCompassRAG无需额外调用大语言模型即可实现主题感知检索，同时提升效率与证据质量。在六个复杂检索基准测试中，MCompassRAG的信息效率平均提升8.24%，且延迟比最强的高效RAG基线降低5倍以上。代码已开源：https://github.com/AmirAbaskohi/MCompassRAG。

English

Retrieval-augmented generation (RAG) systems depend critically on how documents are chunked and searched. Fine-grained chunks can improve retrieval precision but expand the search space, increasing latency and cost; larger chunks reduce the number of candidates but make dense similarity less reliable, as the representation for each chunk mixes multiple topics and introduces more semantic noise. This trade-off becomes especially limiting in deep research tasks, where retrieval must be both fast and precise across large, heterogeneous corpora. We introduce MCompassRAG, a metadata-guided retrieval framework that uses topic-level signals as a semantic compass for selecting relevant evidence. Instead of relying only on cosine similarity between queries and noisy chunk embeddings, MCompassRAG enriches chunk representations with topic metadata in the same embedding space and trains a lightweight retriever through LLM-teacher distillation. At inference time, MCompassRAG performs topic-aware retrieval without additional LLM calls, improving both efficiency and evidence quality. Across six complex retrieval benchmarks, MCompassRAG improves information efficiency (IE) by 8.24% on average with over 5 times lower latency than the strongest efficient RAG baselines. Code is available on https://github.com/AmirAbaskohi/MCompassRAG.