分布条件传输

摘要

学习将源分布映射至目标分布的传输模型是机器学习中的经典问题，但科学应用日益需要能够泛化至训练时未见的源分布与目标分布的模型。我们提出分布条件化传输（DCT）框架，该框架通过源分布与目标分布的嵌入表示来条件化传输映射，从而实现对未见分布对的泛化能力。DCT还可用于分布预测问题的半监督学习：由于它能从任意分布对中学习，因此可借助仅单侧条件观测到的分布来提升传输预测性能。该框架与底层传输机制无关，支持从流匹配到基于分布散度的模型（如Wasserstein距离、MMD）等多种架构。我们通过合成基准测试及四个生物学应用验证了DCT的实际性能优势：单细胞基因组学中的批次效应迁移、质谱流式数据的扰动预测、造血过程中克隆转录动态的学习，以及T细胞受体序列进化建模。

English

Learning a transport model that maps a source distribution to a target distribution is a canonical problem in machine learning, but scientific applications increasingly require models that can generalize to source and target distributions unseen during training. We introduce distribution-conditioned transport (DCT), a framework that conditions transport maps on learned embeddings of source and target distributions, enabling generalization to unseen distribution pairs. DCT also allows semi-supervised learning for distributional forecasting problems: because it learns from arbitrary distribution pairs, it can leverage distributions observed at only one condition to improve transport prediction. DCT is agnostic to the underlying transport mechanism, supporting models ranging from flow matching to distributional divergence-based models (e.g. Wasserstein, MMD). We demonstrate the practical performance benefits of DCT on synthetic benchmarks and four applications in biology: batch effect transfer in single-cell genomics, perturbation prediction from mass cytometry data, learning clonal transcriptional dynamics in hematopoiesis, and modeling T-cell receptor sequence evolution.

分布条件传输

Distribution-Conditioned Transport

摘要

Support