분포 조건화 전송

초록

특정 원천 분포에서 대상 분포로의 변환을 학습하는 전달 모델 구축은 기계 학습의 대표적인 문제이나, 최근 과학적 응용에서는 학습 과정에서 접하지 않은 새로운 원천 및 대상 분포로의 일반화가 가능한 모델에 대한 요구가 증가하고 있다. 본 연구에서는 원천 및 대상 분포의 학습된 임베딩을 전달 맵에 조건화하는 분포 조건화 전달(DCT) 프레임워크를 제안하며, 이를 통해 학습되지 않은 분포 쌍에 대한 일반화를 가능하게 한다. DCT는 또한 분포 예측 문제에 대한 준지도 학습을 지원한다: 임의의 분포 쌍으로부터 학습하므로, 단일 조건에서만 관측된 분포를 활용하여 전달 예측 성능을 향상시킬 수 있다. DCT는 기본 전달 메커니즘에 독립적으로, 흐름 정합(flow matching)부터 분포 발산 기반 모델(예: 바셰스테인, MMD)에 이르기까지 다양한 모델을 지원한다. 우리는 DCT의 실용적 성능 이점을 합성 벤치마크와 생물학 네 가지 응용 분야(단일세포 유전체학에서의 배치 효과 전달, 질량 세포분석 데이터를 이용한 교란 예측, 조혈 과정에서의 클론별 전사 동역학 학습, T 세포 수용체 서열 진화 모델링)에서 입증한다.

English

Learning a transport model that maps a source distribution to a target distribution is a canonical problem in machine learning, but scientific applications increasingly require models that can generalize to source and target distributions unseen during training. We introduce distribution-conditioned transport (DCT), a framework that conditions transport maps on learned embeddings of source and target distributions, enabling generalization to unseen distribution pairs. DCT also allows semi-supervised learning for distributional forecasting problems: because it learns from arbitrary distribution pairs, it can leverage distributions observed at only one condition to improve transport prediction. DCT is agnostic to the underlying transport mechanism, supporting models ranging from flow matching to distributional divergence-based models (e.g. Wasserstein, MMD). We demonstrate the practical performance benefits of DCT on synthetic benchmarks and four applications in biology: batch effect transfer in single-cell genomics, perturbation prediction from mass cytometry data, learning clonal transcriptional dynamics in hematopoiesis, and modeling T-cell receptor sequence evolution.

분포 조건화 전송

Distribution-Conditioned Transport

초록

Support