UniPath: 面向統一多模態推理的理解與生成自適應協調
UniPath: Adaptive Coordination of Understanding and Generation for Unified Multimodal Reasoning
May 12, 2026
作者: Hayes Bai, Yinyi Luo, Wenwen Wang, Qingsong Wen, Jindong Wang
cs.AI
摘要
統一多模態模型(UMMs)旨在於單一架構中整合理解與生成能力。然而,如何有效協調這兩種能力以實現更高效且有效的推理,仍有待深入探討。現有的協調方法若非在訓練階段進行耦合(缺乏推理階段的即時協調),便是對所有輸入施加固定的協調模式。本研究揭示,多模態任務呈現顯著的**協調路徑多樣性**:不同輸入偏好不同的協調路徑。這表明利用此多樣性是提升效能的關鍵。我們提出**UniPath**框架,用於自適應建模並利用協調路徑多樣性。此框架不強加單一協調模式,而是將任務求解表示為路徑的選擇與執行——路徑涵蓋直接回答、文本推理、視覺思維建構及基於假設的探索。我們建構角色對齊軌跡以訓練路徑條件執行器,並引入輕量規劃器機制以實現依賴輸入的路徑選擇。實驗結果顯示,利用協調路徑多樣性可提升效能,優於固定協調策略,同時提供可解釋的中間行為。程式碼位於:https://github.com/AIFrontierLab/TorchUMM/tree/main/src/umm/post_training/unipath
English
Unified multimodal models (UMMs) aim to integrate understanding and generation within a single architecture. However, it remains underexplored how to effectively coordinate these two capabilities for more effective and efficient reasoning. Existing coordination approaches either perform coupling during training, without explicit inference-time coordination, or impose a fixed coordination pattern for all inputs. In this work, we show that multimodal tasks exhibit substantial coordination-path diversity: different inputs favor different coordination paths. This suggests that exploiting such diversity is key to improving performance. We propose UniPath, a framework for adaptively modeling and exploiting coordination-path diversity. Instead of enforcing a single coordination pattern, we represent task solving as the selection and execution of a path, ranging from direct answering to textual inference, visual-thought construction, and hypothesis-based exploration. We construct role-aligned trajectories to train a path-conditioned executor and introduce a lightweight planner mechanism to enable input-dependent path selection. Experiments show that leveraging coordination-path diversity improves performance over fixed coordination strategies while providing interpretable intermediate behaviors. The code is available at:https://github.com/AIFrontierLab/TorchUMM/tree/main/src/umm/post_training/unipath.