分布条件付き輸送

要旨

ソース分布からターゲット分布への写像を学習する輸送モデルは、機械学習における典型的な問題である。しかし、科学技術応用では、学習時に未観測のソース分布およびターゲット分布へ一般化可能なモデルがますます求められている。本研究では、分布条件付き輸送（DCT）フレームワークを提案する。DCTは、ソース分布とターゲット分布の埋め込み表現を学習し、それを条件として輸送写像を構築することで、未観測の分布ペアへの一般化を可能とする。さらにDCTは、分布予測問題に対する半教師あり学習を可能にする。任意の分布ペアから学習できるため、片方の条件でしか観測されていない分布を活用して輸送予測を改善できる。DCTは基盤となる輸送メカニズムに依存しないため、フローマッチングから分布発散に基づくモデル（ワッサーstein距離、MMDなど）まで、様々なモデルをサポートする。DCTの実用的な性能向上を、合成ベンチマークおよび生物学における四つの応用課題（単一細胞ゲノミクスにおけるバッチ効果転移、質量サイトメトリー・データからの摂動予測、造血におけるクローン転写ダイナミクスの学習、T細胞受容体配列進化のモデリング）によって実証する。

English

Learning a transport model that maps a source distribution to a target distribution is a canonical problem in machine learning, but scientific applications increasingly require models that can generalize to source and target distributions unseen during training. We introduce distribution-conditioned transport (DCT), a framework that conditions transport maps on learned embeddings of source and target distributions, enabling generalization to unseen distribution pairs. DCT also allows semi-supervised learning for distributional forecasting problems: because it learns from arbitrary distribution pairs, it can leverage distributions observed at only one condition to improve transport prediction. DCT is agnostic to the underlying transport mechanism, supporting models ranging from flow matching to distributional divergence-based models (e.g. Wasserstein, MMD). We demonstrate the practical performance benefits of DCT on synthetic benchmarks and four applications in biology: batch effect transfer in single-cell genomics, perturbation prediction from mass cytometry data, learning clonal transcriptional dynamics in hematopoiesis, and modeling T-cell receptor sequence evolution.

分布条件付き輸送

Distribution-Conditioned Transport

要旨

Support