选择求知:面向领域特定问答的内外部知识自选框架
Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering
August 21, 2025
作者: Bolei He, Xinran He, Run Shao, Shanfu Shu, Xianwei Xue, Mingquan Cheng, Haifeng Li, Zhenhua Ling
cs.AI
摘要
大型语言模型(LLMs)在通用问答任务中表现出色,但在特定领域场景中往往表现欠佳。检索增强生成(RAG)引入了外部知识,却因噪声检索导致幻觉和延迟问题。持续预训练虽能内化领域知识,但成本高昂且缺乏跨领域灵活性。我们将这一挑战归因于领域知识的长尾分布,使得部分有用但未被充分利用的内部知识未能发挥其价值。我们进一步主张,知识获取应遵循渐进式原则,模仿人类学习过程:先理解概念,再将其应用于复杂推理。为此,我们提出了Selct2Know(S2K),一个成本效益高的框架,通过内部-外部知识自选择策略和选择性监督微调来内化领域知识。我们还引入了一个结构化推理数据生成管道,并整合GRPO以增强推理能力。在医疗、法律和金融问答基准测试中,S2K持续超越现有方法,并以显著更低的成本匹配领域预训练的LLMs。
English
Large Language Models (LLMs) perform well in general QA but often struggle in
domain-specific scenarios. Retrieval-Augmented Generation (RAG) introduces
external knowledge but suffers from hallucinations and latency due to noisy
retrievals. Continued pretraining internalizes domain knowledge but is costly
and lacks cross-domain flexibility. We attribute this challenge to the
long-tail distribution of domain knowledge, which leaves partial yet useful
internal knowledge underutilized. We further argue that knowledge acquisition
should be progressive, mirroring human learning: first understanding concepts,
then applying them to complex reasoning. To address this, we propose Selct2Know
(S2K), a cost-effective framework that internalizes domain knowledge through an
internal-external knowledge self-selection strategy and selective supervised
fine-tuning. We also introduce a structured reasoning data generation pipeline
and integrate GRPO to enhance reasoning ability. Experiments on medical, legal,
and financial QA benchmarks show that S2K consistently outperforms existing
methods and matches domain-pretrained LLMs with significantly lower cost.