墨子:药物发现LLM智能体的治理式自主架构
Mozi: Governed Autonomy for Drug Discovery LLM Agents
March 4, 2026
作者: He Cao, Siyu Liu, Fan Zhang, Zijing Liu, Hao Li, Bin Feng, Shengyuan Bai, Leqing Chen, Kai Xie, Yu Li
cs.AI
摘要
工具增强型大语言模型(LLM)智能体有望将科学推理与计算能力相融合,但其在药物发现等高风险领域的应用正面临两大关键瓶颈:无约束的工具使用治理机制与薄弱的长周期可靠性。在依赖关系复杂的药物研发流程中,自主智能体常会偏离至不可复现的运行轨迹——早期阶段的幻觉误差会通过级联效应导致下游环节的连锁失效。为此,我们提出墨子系统,该双层级架构融合了生成式人工智能的灵活性与计算生物学的确定性严谨度。A层(控制平面)建立受监管的监督者-工作者层级体系,通过基于角色的工具隔离机制限定执行动作空间,并驱动基于反思的重新规划;B层(工作流平面)将靶点识别至先导化合物优化等标准药物发现阶段,具象化为具状态可组合的技能图谱。该层级通过严格数据契约与策略性人工介入检查点,在关键决策边界守护科学有效性。
基于"自由推理处理安全任务,结构化执行应对长周期流程"的设计原则,墨子系统内置鲁棒性机制与溯源级审计功能,彻底规避误差累积。我们在生物医学智能体专用基准测试平台PharmaBench上的评估表明,该系统在流程协调精度上显著优于现有基线。通过端到端治疗案例研究,我们进一步验证了墨子系统在探索巨大化学空间、执行严格毒性筛选及生成高竞争力计算机候选分子方面的能力,成功将LLM从脆弱的对话者转型为可靠受控的科研协作者。
English
Tool-augmented large language model (LLM) agents promise to unify scientific reasoning with computation, yet their deployment in high-stakes domains like drug discovery is bottlenecked by two critical barriers: unconstrained tool-use governance and poor long-horizon reliability. In dependency-heavy pharmaceutical pipelines, autonomous agents often drift into irreproducible trajectories, where early-stage hallucinations multiplicatively compound into downstream failures. To overcome this, we present Mozi, a dual-layer architecture that bridges the flexibility of generative AI with the deterministic rigor of computational biology. Layer A (Control Plane) establishes a governed supervisor--worker hierarchy that enforces role-based tool isolation, limits execution to constrained action spaces, and drives reflection-based replanning. Layer B (Workflow Plane) operationalizes canonical drug discovery stages -- from Target Identification to Lead Optimization -- as stateful, composable skill graphs. This layer integrates strict data contracts and strategic human-in-the-loop (HITL) checkpoints to safeguard scientific validity at high-uncertainty decision boundaries.
Operating on the design principle of ``free-form reasoning for safe tasks, structured execution for long-horizon pipelines,'' Mozi provides built-in robustness mechanisms and trace-level audibility to completely mitigate error accumulation. We evaluate Mozi on PharmaBench, a curated benchmark for biomedical agents, demonstrating superior orchestration accuracy over existing baselines. Furthermore, through end-to-end therapeutic case studies, we demonstrate Mozi's ability to navigate massive chemical spaces, enforce stringent toxicity filters, and generate highly competitive in silico candidates, effectively transforming the LLM from a fragile conversationalist into a reliable, governed co-scientist.