MMaDA-並行:具備思維感知編輯與生成能力的多模態大型擴散語言模型
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
November 12, 2025
作者: Ye Tian, Ling Yang, Jiongfan Yang, Anran Wang, Yu Tian, Jiani Zheng, Haochen Wang, Zhiyang Teng, Zhuochen Wang, Yinjie Wang, Yunhai Tong, Mengdi Wang, Xiangtai Li
cs.AI
摘要
在思維感知生成技術致力於提升複雜任務表現的同時,我們發現現有序列式自迴歸方法存在關鍵缺陷:由於錯誤傳播可能導致性能反常下降。為系統性分析此問題,我們提出ParaBench基準測試框架,專注於評估文本與圖像雙重輸出模態。透過ParaBench的實驗分析顯示,這種性能退化與生成推理過程和最終圖像間的對齊失準存在強相關性。為解決此問題,我們提出並行多模態擴散框架MMaDA-Parallel,該框架能在整個去噪軌跡中實現文本與圖像的連續雙向交互。MMaDA-Parallel先透過監督式微調進行訓練,再經由創新策略——並行強化學習(ParaRL)進行優化,該策略沿軌跡施加語義獎勵以強化跨模態一致性。實驗驗證表明,我們的模型顯著提升了跨模態對齊與語義連貫性,在ParaBench上相較現有最先進模型Bagel實現了6.9%的輸出對齊度提升,為思維感知圖像合成建立了更穩健的範式。相關代碼已開源於:https://github.com/tyfeld/MMaDA-Parallel
English
While thinking-aware generation aims to improve performance on complex tasks, we identify a critical failure mode where existing sequential, autoregressive approaches can paradoxically degrade performance due to error propagation. To systematically analyze this issue, we propose ParaBench, a new benchmark designed to evaluate both text and image output modalities. Our analysis using ParaBench reveals that this performance degradation is strongly correlated with poor alignment between the generated reasoning and the final image. To resolve this, we propose a parallel multimodal diffusion framework, MMaDA-Parallel, that enables continuous, bidirectional interaction between text and images throughout the entire denoising trajectory. MMaDA-Parallel is trained with supervised finetuning and then further optimized by Parallel Reinforcement Learning (ParaRL), a novel strategy that applies semantic rewards along the trajectory to enforce cross-modal consistency. Experiments validate that our model significantly improves cross-modal alignment and semantic consistency, achieving a 6.9\% improvement in Output Alignment on ParaBench compared to the state-of-the-art model, Bagel, establishing a more robust paradigm for thinking-aware image synthesis. Our code is open-sourced at https://github.com/tyfeld/MMaDA-Parallel