自律的なメカニズム推論を目指す仮想細胞

要旨

大規模言語モデル（LLMs）は、科学発見を加速させる有望なアプローチとして近年注目を集めている。しかし、生物学のようなオープンエンドな科学領域への応用は、事実に基づいた実践可能な説明の欠如により、依然として限られている。この問題に対処するため、我々は生物学的推論をメカニスティックなアクショングラフとして表現し、体系的な検証と反証を可能とする、仮想細胞のための構造化説明形式を提案する。これを基盤として、生物学的に基礎付けられた知識検索と検証ベースのフィルタリング手法を統合し、メカニスティックな推論を自律的に生成・検証するマルチエージェントフレームワーク「VCR-Agent」を開発した。本フレームワークを用いて、Tahoe-100Mアトラスから導出された検証済みメカニスティック説明から成る「VC-TRACES」データセットを公開する。実証実験により、これらの説明を用いた学習が事実精度を向上させ、下流の遺伝子発現予測タスクにおいてより効果的な教師信号を提供することを示す。これらの結果は、マルチエージェント技術と厳密な検証の相乗効果によって達成される、仮想細胞における信頼性の高いメカニスティック推論の重要性を裏付けるものである。

English

Large language models (LLMs) have recently gained significant attention as a promising approach to accelerate scientific discovery. However, their application in open-ended scientific domains such as biology remains limited, primarily due to the lack of factually grounded and actionable explanations. To address this, we introduce a structured explanation formalism for virtual cells that represents biological reasoning as mechanistic action graphs, enabling systematic verification and falsification. Building upon this, we propose VCR-Agent, a multi-agent framework that integrates biologically grounded knowledge retrieval with a verifier-based filtering approach to generate and validate mechanistic reasoning autonomously. Using this framework, we release VC-TRACES dataset, which consists of verified mechanistic explanations derived from the Tahoe-100M atlas. Empirically, we demonstrate that training with these explanations improves factual precision and provides a more effective supervision signal for downstream gene expression prediction. These results underscore the importance of reliable mechanistic reasoning for virtual cells, achieved through the synergy of multi-agent and rigorous verification.

自律的なメカニズム推論を目指す仮想細胞

Towards Autonomous Mechanistic Reasoning in Virtual Cells

要旨

Support