ChatPaper.aiChatPaper

MMaDA-并行:面向思维感知编辑与生成的多模态大扩散语言模型

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

November 12, 2025
作者: Ye Tian, Ling Yang, Jiongfan Yang, Anran Wang, Yu Tian, Jiani Zheng, Haochen Wang, Zhiyang Teng, Zhuochen Wang, Yinjie Wang, Yunhai Tong, Mengdi Wang, Xiangtai Li
cs.AI

摘要

在思维感知生成技术致力于提升复杂任务性能的同时,我们发现现有序列化自回归方法存在关键失效模式——由于错误传播反而可能导致性能下降。为系统分析该问题,我们提出ParaBench基准测试框架,专门用于评估文本与图像两种输出模态。基于ParaBench的分析表明,这种性能退化与生成推理过程和最终图像之间的错位高度相关。为此,我们提出并行多模态扩散框架MMaDA-Parallel,通过在完整去噪轨迹中实现文本与图像的持续双向交互来解决该问题。该框架先通过监督微调进行训练,再采用新型并行强化学习策略(ParaRL)进行优化——该策略沿轨迹施加语义奖励以强化跨模态一致性。实验验证表明,我们的模型显著提升了跨模态对齐与语义一致性,在ParaBench上相比最先进的Bagel模型实现了6.9%的输出对齐度提升,为思维感知图像合成建立了更稳健的范式。代码已开源:https://github.com/tyfeld/MMaDA-Parallel
English
While thinking-aware generation aims to improve performance on complex tasks, we identify a critical failure mode where existing sequential, autoregressive approaches can paradoxically degrade performance due to error propagation. To systematically analyze this issue, we propose ParaBench, a new benchmark designed to evaluate both text and image output modalities. Our analysis using ParaBench reveals that this performance degradation is strongly correlated with poor alignment between the generated reasoning and the final image. To resolve this, we propose a parallel multimodal diffusion framework, MMaDA-Parallel, that enables continuous, bidirectional interaction between text and images throughout the entire denoising trajectory. MMaDA-Parallel is trained with supervised finetuning and then further optimized by Parallel Reinforcement Learning (ParaRL), a novel strategy that applies semantic rewards along the trajectory to enforce cross-modal consistency. Experiments validate that our model significantly improves cross-modal alignment and semantic consistency, achieving a 6.9\% improvement in Output Alignment on ParaBench compared to the state-of-the-art model, Bagel, establishing a more robust paradigm for thinking-aware image synthesis. Our code is open-sourced at https://github.com/tyfeld/MMaDA-Parallel
PDF673December 1, 2025