CellForge: 仮想細胞モデルのエージェント型設計

要旨

仮想細胞モデリングは、人工知能と生物学の交差点に位置する新たなフロンティアであり、多様な摂動に対する応答などの量を定量的に予測することを目指している。しかし、生物システムの複雑さ、データモダリティの異質性、および複数分野にわたるドメイン固有の専門知識の必要性から、仮想細胞のための計算モデルを自律的に構築することは困難である。ここでは、提示された生物学的データセットと研究目的を直接最適化された仮想細胞の計算モデルに変換するマルチエージェントフレームワークを活用したエージェントシステム、CellForgeを紹介する。具体的には、生のシングルセルマルチオミクスデータとタスク記述のみを入力として、CellForgeは最適化されたモデルアーキテクチャと仮想細胞モデルのトレーニングおよび推論のための実行可能なコードを出力する。このフレームワークは、提示されたデータセットの特性評価と関連文献の検索を行うタスク分析、専門エージェントが協力して最適化されたモデリング戦略を開発するメソッド設計、およびコードの自動生成を行う実験実行の3つのコアモジュールを統合している。設計モジュールのエージェントは、異なる視点を持つ専門家と中央のモデレーターに分かれており、合理的な合意に達するまで協力的に解決策を交換しなければならない。我々は、遺伝子ノックアウト、薬物処理、およびサイトカイン刺激を含む多様なモダリティにわたる6つのデータセットを使用して、CellForgeのシングルセル摂動予測能力を実証した。CellForgeは、タスク固有の最先端の手法を一貫して上回った。全体として、CellForgeは、異なる視点を持つLLMエージェント間の反復的な相互作用が、モデリングの課題に直接取り組むよりも優れた解決策を提供することを示している。我々のコードはhttps://github.com/gersteinlab/CellForgeで公開されている。

English

Virtual cell modeling represents an emerging frontier at the intersection of artificial intelligence and biology, aiming to predict quantities such as responses to diverse perturbations quantitatively. However, autonomously building computational models for virtual cells is challenging due to the complexity of biological systems, the heterogeneity of data modalities, and the need for domain-specific expertise across multiple disciplines. Here, we introduce CellForge, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. More specifically, given only raw single-cell multi-omics data and task descriptions as input, CellForge outputs both an optimized model architecture and executable code for training virtual cell models and inference. The framework integrates three core modules: Task Analysis for presented dataset characterization and relevant literature retrieval, Method Design, where specialized agents collaboratively develop optimized modeling strategies, and Experiment Execution for automated generation of code. The agents in the Design module are separated into experts with differing perspectives and a central moderator, and have to collaboratively exchange solutions until they achieve a reasonable consensus. We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets that encompass gene knockouts, drug treatments, and cytokine stimulations across multiple modalities. CellForge consistently outperforms task-specific state-of-the-art methods. Overall, CellForge demonstrates how iterative interaction between LLM agents with differing perspectives provides better solutions than directly addressing a modeling challenge. Our code is publicly available at https://github.com/gersteinlab/CellForge.

CellForge: 仮想細胞モデルのエージェント型設計

CellForge: Agentic Design of Virtual Cell Models

要旨

Support