ChatPaper.aiChatPaper

Mol-R1:邁向分子發現中的顯式長鏈思維推理

Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

August 11, 2025
作者: Jiatong Li, Weida Wang, Qinggang Zhang, Junxian Li, Di Zhang, Changmeng Zheng, Shufei Zhang, Xiaoyong Wei, Qing Li
cs.AI

摘要

大型語言模型(LLMs),尤其是如DeepSeek-R1和QWQ這類顯式長鏈思維(CoT)推理模型,已展現出強大的推理能力,在常識推理和數學推斷中取得了令人印象深刻的表現。儘管這些長鏈CoT推理模型效果顯著,但它們在知識密集型領域(如分子發現)中的能力和效率常受到批評。要在這一領域取得成功,需要對領域知識(包括分子結構和化學原理)有精確的理解,這由於分子數據固有的複雜性和高質量專家註解的稀缺性而具有挑戰性。為彌合這一差距,我們引入了Mol-R1,這是一個旨在提升類似R1的顯式長鏈CoT推理LLMs在基於文本的分子生成中的可解釋性和推理性能的新框架。我們的方法始於通過先驗調節的上下文蒸餾(PRID)策劃的高質量推理數據集,這是一種專門的蒸餾策略,旨在有效生成由先驗調節指導的配對推理軌跡。在此基礎上,我們引入了MoIA,即分子迭代適應,這是一種精細的訓練策略,它迭代地結合了監督微調(SFT)與強化策略優化(RPO),專門用於提升類似R1的推理模型在分子發現中的推理性能。最後,我們檢驗了Mol-R1在基於文本的分子推理生成任務中的表現,顯示出相較於現有基線的優越性能。
English
Large language models (LLMs), especially Explicit Long Chain-of-Thought (CoT) reasoning models like DeepSeek-R1 and QWQ, have demonstrated powerful reasoning capabilities, achieving impressive performance in commonsense reasoning and mathematical inference. Despite their effectiveness, Long-CoT reasoning models are often criticized for their limited ability and low efficiency in knowledge-intensive domains such as molecule discovery. Success in this field requires a precise understanding of domain knowledge, including molecular structures and chemical principles, which is challenging due to the inherent complexity of molecular data and the scarcity of high-quality expert annotations. To bridge this gap, we introduce Mol-R1, a novel framework designed to improve explainability and reasoning performance of R1-like Explicit Long-CoT reasoning LLMs in text-based molecule generation. Our approach begins with a high-quality reasoning dataset curated through Prior Regulation via In-context Distillation (PRID), a dedicated distillation strategy to effectively generate paired reasoning traces guided by prior regulations. Building upon this, we introduce MoIA, Molecular Iterative Adaptation, a sophisticated training strategy that iteratively combines Supervised Fine-tuning (SFT) with Reinforced Policy Optimization (RPO), tailored to boost the reasoning performance of R1-like reasoning models for molecule discovery. Finally, we examine the performance of Mol-R1 in the text-based molecule reasoning generation task, showing superior performance against existing baselines.
PDF298August 14, 2025