UniPath:面向统一多模态推理的理解与生成的自适应协调
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
May 12, 2026
作者: Hayes Bai, Yinyi Luo, Wenwen Wang, Qingsong Wen, Jindong Wang
cs.AI
摘要
统一多模态模型(UMMs)旨在将理解与生成能力整合至单一架构中。然而,如何有效协调这两种能力以实现更高效和更有效的推理仍未被充分探索。现有协调方法要么在训练阶段进行耦合,缺乏明确的推理时协调,要么对所有输入施加固定的协调模式。本研究证明,多模态任务展现出显著的协调路径多样性:不同输入偏好不同的协调路径。这表明利用这种多样性是提升性能的关键。本文提出UniPath框架,用于自适应建模和利用协调路径多样性。我们不再强制使用单一协调模式,而是将任务求解表示为从直接回答、文本推理、视觉思维构建到基于假设的探索等路径的选择与执行。通过构建角色对齐轨迹来训练路径条件执行器,并引入轻量级规划器机制以实现输入相关的路径选择。实验表明,利用协调路径多样性相比固定协调策略能提升性能,同时提供可解释的中间行为。代码地址:https://github.com/AIFrontierLab/TorchUMM/tree/main/src/umm/post_training/unipath。
English
Unified multimodal models (UMMs) aim to integrate understanding and generation within a single architecture. However, it remains underexplored how to effectively coordinate these two capabilities for more effective and efficient reasoning. Existing coordination approaches either perform coupling during training, without explicit inference-time coordination, or impose a fixed coordination pattern for all inputs. In this work, we show that multimodal tasks exhibit substantial coordination-path diversity: different inputs favor different coordination paths. This suggests that exploiting such diversity is key to improving performance. We propose UniPath, a framework for adaptively modeling and exploiting coordination-path diversity. Instead of enforcing a single coordination pattern, we represent task solving as the selection and execution of a path, ranging from direct answering to textual inference, visual-thought construction, and hypothesis-based exploration. We construct role-aligned trajectories to train a path-conditioned executor and introduce a lightweight planner mechanism to enable input-dependent path selection. Experiments show that leveraging coordination-path diversity improves performance over fixed coordination strategies while providing interpretable intermediate behaviors. The code is available at:https://github.com/AIFrontierLab/TorchUMM/tree/main/src/umm/post_training/unipath.