選擇性認知：面向特定領域問答的內外部知識自選框架

摘要

大型語言模型（LLMs）在通用問答任務中表現出色，但在特定領域場景中往往表現不佳。檢索增強生成（RAG）引入了外部知識，但由於檢索結果的噪音，容易產生幻覺並導致延遲。持續預訓練雖然能內化領域知識，但成本高昂且缺乏跨領域的靈活性。我們將這一挑戰歸因於領域知識的長尾分佈，這使得部分有用但未被充分利用的內部知識未能發揮作用。我們進一步主張，知識獲取應是漸進式的，模仿人類的學習過程：先理解概念，再將其應用於複雜推理。為解決這一問題，我們提出了Selct2Know（S2K），這是一個成本效益高的框架，通過內部-外部知識自選策略和選擇性監督微調來內化領域知識。我們還引入了一個結構化推理數據生成管道，並整合GRPO以增強推理能力。在醫學、法律和金融問答基準測試中的實驗表明，S2K始終優於現有方法，並以顯著更低的成本匹配了領域預訓練的LLMs。

English

Large Language Models (LLMs) perform well in general QA but often struggle in domain-specific scenarios. Retrieval-Augmented Generation (RAG) introduces external knowledge but suffers from hallucinations and latency due to noisy retrievals. Continued pretraining internalizes domain knowledge but is costly and lacks cross-domain flexibility. We attribute this challenge to the long-tail distribution of domain knowledge, which leaves partial yet useful internal knowledge underutilized. We further argue that knowledge acquisition should be progressive, mirroring human learning: first understanding concepts, then applying them to complex reasoning. To address this, we propose Selct2Know (S2K), a cost-effective framework that internalizes domain knowledge through an internal-external knowledge self-selection strategy and selective supervised fine-tuning. We also introduce a structured reasoning data generation pipeline and integrate GRPO to enhance reasoning ability. Experiments on medical, legal, and financial QA benchmarks show that S2K consistently outperforms existing methods and matches domain-pretrained LLMs with significantly lower cost.

選擇性認知：面向特定領域問答的內外部知識自選框架

Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

摘要

Support