CellForge: 가상 세포 모델의 에이전트 기반 설계

초록

가상 세포 모델링은 인공지능과 생물학의 교차점에서 등장한 새로운 분야로, 다양한 외부 자극에 대한 반응과 같은 양적 예측을 목표로 합니다. 그러나 생물학적 시스템의 복잡성, 데이터 양식의 이질성, 그리고 다학제적 도메인 전문 지식의 필요성으로 인해 가상 세포를 위한 계산 모델을 자율적으로 구축하는 것은 어려운 과제입니다. 본 연구에서는 CellForge를 소개합니다. 이는 제시된 생물학적 데이터셋과 연구 목표를 직접 최적화된 가상 세포 계산 모델로 변환하는 다중 에이전트 프레임워크를 활용한 에이전트 시스템입니다. 구체적으로, 단일 세포 다중오믹스 데이터와 작업 설명만을 입력으로 받아, CellForge는 최적화된 모델 아키텍처와 가상 세포 모델의 학습 및 추론을 위한 실행 코드를 출력합니다. 이 프레임워크는 세 가지 핵심 모듈로 구성됩니다: 제시된 데이터셋의 특성 분석과 관련 문헌 검색을 담당하는 Task Analysis, 전문 에이전트들이 협력하여 최적의 모델링 전략을 개발하는 Method Design, 그리고 코드의 자동 생성을 담당하는 Experiment Execution. Design 모듈의 에이전트들은 서로 다른 관점을 가진 전문가와 중재자로 구성되며, 합리적인 합의에 도달할 때까지 협력적으로 솔루션을 교환해야 합니다. 우리는 CellForge의 능력을 단일 세포 외부 자극 예측에서 입증하기 위해, 유전자 녹아웃, 약물 처리, 사이토카인 자극 등 다양한 양식을 포함하는 여섯 개의 데이터셋을 사용했습니다. CellForge는 작업별 최신 기술을 일관되게 능가하는 성능을 보였습니다. 전반적으로, CellForge는 서로 다른 관점을 가진 LLM 에이전트 간의 반복적인 상호작용이 모델링 문제를 직접 해결하는 것보다 더 나은 솔루션을 제공할 수 있음을 보여줍니다. 우리의 코드는 https://github.com/gersteinlab/CellForge에서 공개되어 있습니다.

English

Virtual cell modeling represents an emerging frontier at the intersection of artificial intelligence and biology, aiming to predict quantities such as responses to diverse perturbations quantitatively. However, autonomously building computational models for virtual cells is challenging due to the complexity of biological systems, the heterogeneity of data modalities, and the need for domain-specific expertise across multiple disciplines. Here, we introduce CellForge, an agentic system that leverages a multi-agent framework that transforms presented biological datasets and research objectives directly into optimized computational models for virtual cells. More specifically, given only raw single-cell multi-omics data and task descriptions as input, CellForge outputs both an optimized model architecture and executable code for training virtual cell models and inference. The framework integrates three core modules: Task Analysis for presented dataset characterization and relevant literature retrieval, Method Design, where specialized agents collaboratively develop optimized modeling strategies, and Experiment Execution for automated generation of code. The agents in the Design module are separated into experts with differing perspectives and a central moderator, and have to collaboratively exchange solutions until they achieve a reasonable consensus. We demonstrate CellForge's capabilities in single-cell perturbation prediction, using six diverse datasets that encompass gene knockouts, drug treatments, and cytokine stimulations across multiple modalities. CellForge consistently outperforms task-specific state-of-the-art methods. Overall, CellForge demonstrates how iterative interaction between LLM agents with differing perspectives provides better solutions than directly addressing a modeling challenge. Our code is publicly available at https://github.com/gersteinlab/CellForge.

CellForge: 가상 세포 모델의 에이전트 기반 설계

CellForge: Agentic Design of Virtual Cell Models

초록

Support