ChemDFM-R: 原子化された化学知識を強化した化学推論LLM

要旨

大規模言語モデル（LLM）は目覚ましい進歩を遂げているものの、化学などの科学分野への応用は、浅いドメイン理解と限られた推論能力によって依然として妨げられている。本研究では、化学という特定の分野に焦点を当て、化学推論LLMであるChemDFM-Rを開発する。まず、モデルの基本原理と論理構造の理解を深めるために、原子化された知識ポイントの包括的なデータセットを構築する。次に、専門家がキュレートした知識と一般領域の推論スキルを統合する混合ソース蒸留戦略を提案し、その後、化学推論を強化するためのドメイン固有の強化学習を実施する。多様な化学ベンチマークでの実験により、ChemDFM-Rが最先端の性能を達成し、解釈可能で根拠に基づいた出力を提供することが実証された。さらに、ケーススタディを通じて、明示的な推論チェーンが現実世界の人間-AI協働シナリオにおけるモデルの信頼性、透明性、実用性を大幅に向上させることを示す。

English

While large language models (LLMs) have achieved impressive progress, their application in scientific domains such as chemistry remains hindered by shallow domain understanding and limited reasoning capabilities. In this work, we focus on the specific field of chemistry and develop a Chemical Reasoner LLM, ChemDFM-R. We first construct a comprehensive dataset of atomized knowledge points to enhance the model's understanding of the fundamental principles and logical structure of chemistry. Then, we propose a mix-sourced distillation strategy that integrates expert-curated knowledge with general-domain reasoning skills, followed by domain-specific reinforcement learning to enhance chemical reasoning. Experiments on diverse chemical benchmarks demonstrate that ChemDFM-R achieves state-of-the-art performance while providing interpretable, rationale-driven outputs. Further case studies illustrate how explicit reasoning chains significantly improve the reliability, transparency, and practical utility of the model in real-world human-AI collaboration scenarios.

ChemDFM-R: 原子化された化学知識を強化した化学推論LLM

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

要旨

Support