靈樞細胞:一種用於轉錄組建模的生成式細胞世界模型——邁向虛擬細胞之路
Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells
March 26, 2026
作者: Han Zhang, Guo-Hua Yuan, Chaohao Yuan, Tingyang Xu, Tian Bian, Hong Cheng, Wenbing Huang, Deli Zhao, Yu Rong
cs.AI
摘要
建模細胞狀態並預測其對擾動的反應,是計算生物學和虛擬細胞開發中的核心挑戰。現有的單細胞轉錄組學基礎模型能提供強大的靜態表徵,但無法顯式建模細胞狀態分佈以進行生成式模擬。本文提出靈樞細胞(Lingshu-Cell)——一種掩碼離散擴散模型,該模型能學習轉錄組狀態分佈並支持擾動下的條件模擬。通過直接在與單細胞轉錄組數據的稀疏性、非順序性相兼容的離散標記空間中操作,靈樞細胞無需依賴預先的基因選擇(如基於高變異性過濾或表達量排序),即可捕捉約18,000個基因間複雜的全轉錄組表達依賴關係。在多種組織和物種中,靈樞細胞能精準重現轉錄組分佈、標記基因表達模式和細胞亞型比例,證明其捕捉複雜細胞異質性的能力。此外,通過將細胞類型或供體身份與擾動共同嵌入模型,靈樞細胞能預測新穎的身份-擾動組合所引發的全轉錄組表達變化。該模型在虛擬細胞挑戰賽H1基因擾動基準測試中表現領先,並在預測人類PBMC的細胞因子誘導反應中展現優異性能。這些結果共同確立了靈樞細胞作為一個靈活的細胞世界模型,可用於細胞狀態與擾動反應的計算機模擬,為生物學發現和擾動篩選的新範式奠定基礎。
English
Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.