知ることを選ぶ：ドメイン固有質問応答のための内部-外部知識自己選択フレームワーク

要旨

大規模言語モデル（LLM）は一般的な質問応答（QA）において優れた性能を発揮しますが、ドメイン固有のシナリオではしばしば苦戦します。検索拡張生成（RAG）は外部知識を導入しますが、ノイズの多い検索による幻覚や遅延が問題となります。継続的な事前学習はドメイン知識を内部化しますが、コストが高く、クロスドメインの柔軟性に欠けます。私たちはこの課題を、ドメイン知識のロングテール分布に起因すると考えています。これにより、部分的ではあるが有用な内部知識が十分に活用されていないのです。さらに、知識の獲得は人間の学習プロセスを反映し、段階的であるべきだと主張します。つまり、まず概念を理解し、その後複雑な推論に適用するというプロセスです。これを解決するため、私たちはSelct2Know（S2K）を提案します。これは、内部と外部の知識を自己選択する戦略と選択的な教師ありファインチューニングを通じてドメイン知識を内部化する、コスト効率の良いフレームワークです。また、構造化された推論データ生成パイプラインを導入し、GRPOを統合して推論能力を強化します。医療、法律、金融のQAベンチマークでの実験では、S2Kが既存の手法を一貫して上回り、ドメイン事前学習済みLLMと同等の性能を大幅に低いコストで達成することが示されました。

English

Large Language Models (LLMs) perform well in general QA but often struggle in domain-specific scenarios. Retrieval-Augmented Generation (RAG) introduces external knowledge but suffers from hallucinations and latency due to noisy retrievals. Continued pretraining internalizes domain knowledge but is costly and lacks cross-domain flexibility. We attribute this challenge to the long-tail distribution of domain knowledge, which leaves partial yet useful internal knowledge underutilized. We further argue that knowledge acquisition should be progressive, mirroring human learning: first understanding concepts, then applying them to complex reasoning. To address this, we propose Selct2Know (S2K), a cost-effective framework that internalizes domain knowledge through an internal-external knowledge self-selection strategy and selective supervised fine-tuning. We also introduce a structured reasoning data generation pipeline and integrate GRPO to enhance reasoning ability. Experiments on medical, legal, and financial QA benchmarks show that S2K consistently outperforms existing methods and matches domain-pretrained LLMs with significantly lower cost.

知ることを選ぶ：ドメイン固有質問応答のための内部-外部知識自己選択フレームワーク

Select to Know: An Internal-External Knowledge Self-Selection Framework for Domain-Specific Question Answering

要旨

Support