Mol-R1：迈向分子发现中的显式长链推理

摘要

大型语言模型（LLMs），尤其是如DeepSeek-R1和QWQ这类显式长链思维推理（CoT）模型，已展现出强大的推理能力，在常识推理与数学推断任务中取得了令人瞩目的成绩。然而，尽管其效能显著，长链思维推理模型在知识密集型领域，如分子发现方面，常因能力有限及效率低下而受到批评。该领域的成功需精准掌握包括分子结构与化学原理在内的领域知识，这一挑战源于分子数据固有的复杂性及高质量专家标注的稀缺性。为弥合此差距，我们推出了Mol-R1，一个旨在提升R1类显式长链思维推理LLMs在基于文本的分子生成任务中可解释性与推理性能的新框架。我们的方法始于通过“先验规则引导的上下文蒸馏”（PRID）精心策划的高质量推理数据集，这一专用蒸馏策略有效生成了由先验规则指导的配对推理轨迹。在此基础上，我们引入了MoIA，即分子迭代适应，一种将监督微调（SFT）与强化策略优化（RPO）迭代结合的复杂训练策略，专为增强R1类推理模型在分子发现中的推理性能而设计。最终，我们评估了Mol-R1在基于文本的分子推理生成任务中的表现，结果显示其相较于现有基线模型具有更优的性能。

English

Large language models (LLMs), especially Explicit Long Chain-of-Thought (CoT) reasoning models like DeepSeek-R1 and QWQ, have demonstrated powerful reasoning capabilities, achieving impressive performance in commonsense reasoning and mathematical inference. Despite their effectiveness, Long-CoT reasoning models are often criticized for their limited ability and low efficiency in knowledge-intensive domains such as molecule discovery. Success in this field requires a precise understanding of domain knowledge, including molecular structures and chemical principles, which is challenging due to the inherent complexity of molecular data and the scarcity of high-quality expert annotations. To bridge this gap, we introduce Mol-R1, a novel framework designed to improve explainability and reasoning performance of R1-like Explicit Long-CoT reasoning LLMs in text-based molecule generation. Our approach begins with a high-quality reasoning dataset curated through Prior Regulation via In-context Distillation (PRID), a dedicated distillation strategy to effectively generate paired reasoning traces guided by prior regulations. Building upon this, we introduce MoIA, Molecular Iterative Adaptation, a sophisticated training strategy that iteratively combines Supervised Fine-tuning (SFT) with Reinforced Policy Optimization (RPO), tailored to boost the reasoning performance of R1-like reasoning models for molecule discovery. Finally, we examine the performance of Mol-R1 in the text-based molecule reasoning generation task, showing superior performance against existing baselines.

Mol-R1：迈向分子发现中的显式长链推理

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

摘要

Support