GenoMAS：一个通过代码驱动基因表达分析实现科学发现的多智能体框架

摘要

基因表达分析是众多生物医学发现的关键，然而从原始转录组数据中提取洞见仍面临巨大挑战，这源于多个大型半结构化文件的复杂性以及对广泛领域专业知识的需求。当前的自动化方法往往受限于两种极端：要么是僵化的工作流程，在边缘情况下失效；要么是完全自主的智能体，缺乏严谨科学探究所需的精确性。GenoMAS开辟了一条新路径，它通过组建一支基于大语言模型（LLM）的科学家团队，将结构化工作流程的可靠性与自主智能体的适应性相结合。GenoMAS通过类型化的消息传递协议协调六个专门的LLM智能体，每个智能体在共享的分析画布上贡献互补的优势。其核心是一个引导式规划框架：编程智能体将高层次任务指南分解为行动单元，并在每个节点选择推进、修订、绕过或回溯，从而在保持逻辑连贯性的同时，灵活适应基因组数据的独特性。在GenoTEX基准测试中，GenoMAS在数据预处理方面达到了89.13%的综合相似性相关性，在基因识别方面获得了60.48%的F_1分数，分别比之前的最佳成果提高了10.61%和16.85%。除了量化指标，GenoMAS还揭示了与文献相印证、生物学上可信的基因-表型关联，同时调整了潜在的混杂因素。代码可在https://github.com/Liu-Hy/GenoMAS获取。

English

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F_1 of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

GenoMAS：一个通过代码驱动基因表达分析实现科学发现的多智能体框架

GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis

摘要

Support