ChatPaper.aiChatPaper

Mol-R1:迈向分子发现中的显式长链推理

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

August 11, 2025
作者: Jiatong Li, Weida Wang, Qinggang Zhang, Junxian Li, Di Zhang, Changmeng Zheng, Shufei Zhang, Xiaoyong Wei, Qing Li
cs.AI

摘要

大型语言模型(LLMs),尤其是如DeepSeek-R1和QWQ这类显式长链思维推理(CoT)模型,已展现出强大的推理能力,在常识推理与数学推断任务中取得了令人瞩目的成绩。然而,尽管其效能显著,长链思维推理模型在知识密集型领域,如分子发现方面,常因能力有限及效率低下而受到批评。该领域的成功需精准掌握包括分子结构与化学原理在内的领域知识,这一挑战源于分子数据固有的复杂性及高质量专家标注的稀缺性。为弥合此差距,我们推出了Mol-R1,一个旨在提升R1类显式长链思维推理LLMs在基于文本的分子生成任务中可解释性与推理性能的新框架。我们的方法始于通过“先验规则引导的上下文蒸馏”(PRID)精心策划的高质量推理数据集,这一专用蒸馏策略有效生成了由先验规则指导的配对推理轨迹。在此基础上,我们引入了MoIA,即分子迭代适应,一种将监督微调(SFT)与强化策略优化(RPO)迭代结合的复杂训练策略,专为增强R1类推理模型在分子发现中的推理性能而设计。最终,我们评估了Mol-R1在基于文本的分子推理生成任务中的表现,结果显示其相较于现有基线模型具有更优的性能。
English
Large language models (LLMs), especially Explicit Long Chain-of-Thought (CoT) reasoning models like DeepSeek-R1 and QWQ, have demonstrated powerful reasoning capabilities, achieving impressive performance in commonsense reasoning and mathematical inference. Despite their effectiveness, Long-CoT reasoning models are often criticized for their limited ability and low efficiency in knowledge-intensive domains such as molecule discovery. Success in this field requires a precise understanding of domain knowledge, including molecular structures and chemical principles, which is challenging due to the inherent complexity of molecular data and the scarcity of high-quality expert annotations. To bridge this gap, we introduce Mol-R1, a novel framework designed to improve explainability and reasoning performance of R1-like Explicit Long-CoT reasoning LLMs in text-based molecule generation. Our approach begins with a high-quality reasoning dataset curated through Prior Regulation via In-context Distillation (PRID), a dedicated distillation strategy to effectively generate paired reasoning traces guided by prior regulations. Building upon this, we introduce MoIA, Molecular Iterative Adaptation, a sophisticated training strategy that iteratively combines Supervised Fine-tuning (SFT) with Reinforced Policy Optimization (RPO), tailored to boost the reasoning performance of R1-like reasoning models for molecule discovery. Finally, we examine the performance of Mol-R1 in the text-based molecule reasoning generation task, showing superior performance against existing baselines.
PDF298August 14, 2025