通过俳句式算法连接空间生物学与临床组织学
Linking spatial biology and clinical histology via Haiku
April 30, 2026
作者: Yan Cui, Jacob S. Leiby, Wenhui Lei, Dokyoon Kim, Yanxiang Deng, Aaron T. Mayer, Zhenqin Wu, Alexandro E. Trevino, Zhi Huang
cs.AI
摘要
整合分子、形态学与临床数据对于基础与转化生物医学研究至关重要,但目前仍缺乏系统性的多模态联合建模框架。本文提出Haiku——一种基于多重免疫荧光技术的三模态对比学习模型,其训练数据包含来自1,606名患者11种器官类型的3,218个组织切片,共计2,670万个空间蛋白质组学图像块,同时匹配苏木精-伊红染色组织学数据和临床元数据,并在共享嵌入空间中对齐。Haiku支持三向跨模态检索,在下游分类和临床预测任务中表现优于单模态基线,还能通过仅基于临床元数据的文本描述进行融合检索,实现零样本生物标志物推断。在多项任务中,Haiku均优于现有方法:跨模态检索(Recall@50达0.611,基线接近零)、生存预测(C指数0.737,相对提升7.91%)及零样本生物标志物推断(52种生物标志物平均皮尔逊相关系数0.718)。此外,我们引入反事实预测框架,通过仅修改临床元数据而固定组织形态,揭示了与乳腺癌分期进展和肺癌生存结局相关的特定微环境分子变化。在肺腺癌案例研究中,反事实分析发现以CD8和颗粒酶B升高、PD-L1降低及Ki67减少为特征的微环境变化模式,与已有研究中预后良好的报道模式基本一致。这些反事实结果作为探索性假设生成信号呈现,而非机制性结论。Haiku的三模态对齐能力实现了空间生物学的整合分析,为连接分子测量与临床背景的生物学探索搭建了桥梁。
English
Integrating molecular, morphological, and clinical data is essential for basic and translational biomedical research, yet systematic frameworks for jointly modeling these modalities remain limited. Here we present Haiku, a tri-modal contrastive learning model trained on multiplexed immunofluorescence (mIF). It comprises 26.7 million spatial proteomics patches from 3,218 tissue sections across 1,606 patients spanning 11 organ types, with matched hematoxylin and eosin (H&E) histology and clinical metadata aligned in a shared embedding space. Haiku enables three-way cross-modal retrieval, improves downstream classification and clinical prediction tasks over unimodal baselines, and supports zero-shot biomarker inference through fusion retrieval conditioned on clinical metadata-only text descriptions. Across tasks, Haiku outperforms competing approaches, achieving cross-modal retrieval (Recall@50 up to 0.611 versus near-zero baseline), survival prediction (C-index 0.737, +7.91% relative improvement), and zero-shot biomarker inference (mean Pearson correlation 0.718 across 52 biomarkers). Furthermore, we introduce a counterfactual prediction framework in which modifying only clinical metadata while fixing tissue morphology surfaces niche-specific molecular shifts associated with breast cancer stage progression and lung cancer survival outcomes. In a lung adenocarcinoma case study, the counterfactual analysis recovers niche-specific shifts characterized by increased CD8 and granzyme B, reduced PD-L1, and decreased Ki67, broadly consistent with patterns reported for favorable outcomes. We present these counterfactual results as exploratory, hypothesis-generating signals rather than mechanistic claims. These capabilities demonstrate that tri-modal alignment via Haiku enables integrative analysis of spatial biology, bridging molecular measurements with clinical context for biological exploration.