LatentUMM:雙重潛在對齊的統一多模態模型
LatentUMM: Dual Latent Alignment for Unified Multimodal Models
May 18, 2026
作者: Yinyi Luo, Wenwen Wang, Hayes Bai, Marios Savvides, Jindong Wang
cs.AI
摘要
统一多模态模型(UMMs)通过学习共享潜在空间,在理解和生成任务中均展现出强大性能,然而这两种能力之间往往存在功能不一致的问题。我们观察到,这一问题的根源并非缺乏共享表征,而是由于映射到潜在空间和从潜在空间映射回去的变换之间缺乏显式对齐。这导致生成与再编码过程可能遵循不一致的轨迹,进而在模态转换时出现语义漂移。本文提出LatentUMM框架,通过构建增强的共享潜在空间,显式对齐这些变换以提升跨模态一致性。LatentUMM包含两个阶段:首先,双重视角潜在对齐在模态维度和容量维度上实施一致性约束——跨模态对齐借助更强的嵌入模型施加结构化跨模态语义,而双容量对齐则强制生成与再编码过程保持双向一致性。其次,潜在动态稳定性通过随机潜在轨迹展开和偏好优化提升鲁棒性,优先保留语义一致性更佳的轨迹。实验表明,LatentUMM能显著提升不同架构下的多模态一致性。代码开源地址:https://github.com/AIFrontierLab/TorchUMM/tree/main/src/umm/post_training/LatentUMM
English
Unified multimodal models (UMMs) achieve strong performance in both understanding and generation by learning a shared latent space, yet they often exhibit functional inconsistency between these two capabilities. We observe that this issue does not stem from a lack of shared representations, but from the absence of explicit alignment between the transformations that map into and out of the latent space. As a result, generation and re-encoding can follow inconsistent trajectories, leading to semantic drift under modality transitions. In this work, we propose LatentUMM, a framework that constructs an enhanced shared latent space to explicitly align these transformations and improve cross-modal consistency. LatentUMM consists of two stages. First, dual latent alignment enforces consistency at both the modality and capacity levels: cross-modal alignment uses a stronger embedding model to impose structured cross-modal semantics, while dual capacity alignment enforces bidirectional consistency under generation and re-encoding. Second, latent dynamics stabilization improves robustness via stochastic latent rollouts and preference optimization, favoring trajectories that better preserve semantic consistency. Experiments show that LatentUMM consistently improves multimodal consistency across diverse architectures. Code is available at: https://github.com/AIFrontierLab/TorchUMM/tree/main/src/umm/post_training/LatentUMM.