CellForge：虚拟细胞模型的智能体设计

摘要

虚拟细胞建模代表了人工智能与生物学交叉领域的一个新兴前沿，旨在定量预测诸如对多种扰动的响应等量值。然而，由于生物系统的复杂性、数据模态的异质性以及跨多个学科领域专业知识的需求，自主构建虚拟细胞的计算模型颇具挑战。在此，我们介绍CellForge，一个基于多智能体框架的代理系统，它能将提供的生物数据集和研究目标直接转化为优化的虚拟细胞计算模型。具体而言，仅需输入原始的单细胞多组学数据和任务描述，CellForge便能输出用于训练虚拟细胞模型及推理的优化模型架构与可执行代码。该框架整合了三大核心模块：任务分析模块负责数据集特征描述及相关文献检索，方法设计模块中专门化的智能体协作开发优化建模策略，实验执行模块则自动生成代码。设计模块中的智能体被划分为持有不同视角的专家和一位中央协调者，他们需协作交流解决方案直至达成合理共识。我们通过涵盖基因敲除、药物处理和细胞因子刺激等多种模态的六个数据集，展示了CellForge在单细胞扰动预测中的能力。CellForge在各项任务中均优于特定任务的最先进方法。总体而言，CellForge展示了具有不同视角的大语言模型智能体间迭代交互如何比直接应对建模挑战提供更优解决方案。我们的代码已公开于https://github.com/gersteinlab/CellForge。

English

Virtual cell modeling represents an emerging frontier at the intersection of artificial intelligence and biology, aiming to predict quantities such as responses to diverse perturbations quantitatively. However, autonomously building computational models for virtual cells is challenging due to the complexity of biological systems, the heterogeneity of data modalities, and the need for domain-specific expertise across multiple disciplines. Here, we introduce CellForge, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. More specifically, given only raw single-cell multi-omics data and task descriptions as input, CellForge outputs both an optimized model architecture and executable code for training virtual cell models and inference. The framework integrates three core modules: Task Analysis for presented dataset characterization and relevant literature retrieval, Method Design, where specialized agents collaboratively develop optimized modeling strategies, and Experiment Execution for automated generation of code. The agents in the Design module are separated into experts with differing perspectives and a central moderator, and have to collaboratively exchange solutions until they achieve a reasonable consensus. We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets that encompass gene knockouts, drug treatments, and cytokine stimulations across multiple modalities. CellForge consistently outperforms task-specific state-of-the-art methods. Overall, CellForge demonstrates how iterative interaction between LLM agents with differing perspectives provides better solutions than directly addressing a modeling challenge. Our code is publicly available at https://github.com/gersteinlab/CellForge.

CellForge：虚拟细胞模型的智能体设计

CellForge: Agentic Design of Virtual Cell Models

摘要

Support