Mol-R1:迈向分子发现中的显式长链推理
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery
August 11, 2025
作者: Jiatong Li, Weida Wang, Qinggang Zhang, Junxian Li, Di Zhang, Changmeng Zheng, Shufei Zhang, Xiaoyong Wei, Qing Li
cs.AI
摘要
大型语言模型(LLMs),尤其是如DeepSeek-R1和QWQ这类显式长链思维推理(CoT)模型,已展现出强大的推理能力,在常识推理与数学推断任务中取得了令人瞩目的成绩。然而,尽管其效能显著,长链思维推理模型在知识密集型领域,如分子发现方面,常因能力有限及效率低下而受到批评。该领域的成功需精准掌握包括分子结构与化学原理在内的领域知识,这一挑战源于分子数据固有的复杂性及高质量专家标注的稀缺性。为弥合此差距,我们推出了Mol-R1,一个旨在提升R1类显式长链思维推理LLMs在基于文本的分子生成任务中可解释性与推理性能的新框架。我们的方法始于通过“先验规则引导的上下文蒸馏”(PRID)精心策划的高质量推理数据集,这一专用蒸馏策略有效生成了由先验规则指导的配对推理轨迹。在此基础上,我们引入了MoIA,即分子迭代适应,一种将监督微调(SFT)与强化策略优化(RPO)迭代结合的复杂训练策略,专为增强R1类推理模型在分子发现中的推理性能而设计。最终,我们评估了Mol-R1在基于文本的分子推理生成任务中的表现,结果显示其相较于现有基线模型具有更优的性能。
English
Large language models (LLMs), especially Explicit Long Chain-of-Thought (CoT)
reasoning models like DeepSeek-R1 and QWQ, have demonstrated powerful reasoning
capabilities, achieving impressive performance in commonsense reasoning and
mathematical inference. Despite their effectiveness, Long-CoT reasoning models
are often criticized for their limited ability and low efficiency in
knowledge-intensive domains such as molecule discovery. Success in this field
requires a precise understanding of domain knowledge, including molecular
structures and chemical principles, which is challenging due to the inherent
complexity of molecular data and the scarcity of high-quality expert
annotations. To bridge this gap, we introduce Mol-R1, a novel framework
designed to improve explainability and reasoning performance of R1-like
Explicit Long-CoT reasoning LLMs in text-based molecule generation. Our
approach begins with a high-quality reasoning dataset curated through Prior
Regulation via In-context Distillation (PRID), a dedicated distillation
strategy to effectively generate paired reasoning traces guided by prior
regulations. Building upon this, we introduce MoIA, Molecular Iterative
Adaptation, a sophisticated training strategy that iteratively combines
Supervised Fine-tuning (SFT) with Reinforced Policy Optimization (RPO),
tailored to boost the reasoning performance of R1-like reasoning models for
molecule discovery. Finally, we examine the performance of Mol-R1 in the
text-based molecule reasoning generation task, showing superior performance
against existing baselines.