M^4olGen:多智能体协同、多阶段递进的精准多属性约束分子生成框架
M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints
January 15, 2026
作者: Yizhan Li, Florence Cloutier, Sifan Wu, Ali Parviz, Boris Knyazev, Yan Zhang, Glen Berseth, Bang Liu
cs.AI
摘要
生成满足多种理化性质精确数值约束的分子至关重要且充满挑战。尽管大语言模型(LLM)具有强表达能力,但在缺乏外部结构和反馈的情况下,其精确多目标控制与数值推理能力仍显不足。我们提出M olGen——一个面向多属性约束的分子生成框架,该框架采用片段级别的检索增强双阶段架构。第一阶段:原型生成。通过多智能体推理器执行基于检索的片段级编辑,生成接近可行域的候选分子。第二阶段:基于强化学习的细粒度优化。采用群组相对策略优化(GRPO)训练的片段级优化器实施单跳或多跳优化,在调控编辑复杂度与原型偏离度的同时,显式最小化目标属性误差。支撑两个阶段的是自动构建的大规模数据集,其中包含片段编辑的推理链与实测属性差值,实现了确定性、可复现的监督控制与可控多跳推理。与现有研究不同,本框架通过利用片段增强分子推理能力,并支持针对数值目标的可控优化。在两组属性约束(QED、LogP、分子量以及HOMO、LUMO)下的生成实验表明,该方法在分子有效性和多属性目标精确满足度上均取得稳定提升,性能优于主流大语言模型与基于图的算法。
English
Generating molecules that satisfy precise numeric constraints over multiple physicochemical properties is critical and challenging. Although large language models (LLMs) are expressive, they struggle with precise multi-objective control and numeric reasoning without external structure and feedback. We introduce M olGen, a fragment-level, retrieval-augmented, two-stage framework for molecule generation under multi-property constraints. Stage I : Prototype generation: a multi-agent reasoner performs retrieval-anchored, fragment-level edits to produce a candidate near the feasible region. Stage II : RL-based fine-grained optimization: a fragment-level optimizer trained with Group Relative Policy Optimization (GRPO) applies one- or multi-hop refinements to explicitly minimize the property errors toward our target while regulating edit complexity and deviation from the prototype. A large, automatically curated dataset with reasoning chains of fragment edits and measured property deltas underpins both stages, enabling deterministic, reproducible supervision and controllable multi-hop reasoning. Unlike prior work, our framework better reasons about molecules by leveraging fragments and supports controllable refinement toward numeric targets. Experiments on generation under two sets of property constraints (QED, LogP, Molecular Weight and HOMO, LUMO) show consistent gains in validity and precise satisfaction of multi-property targets, outperforming strong LLMs and graph-based algorithms.