LLaDA-o:一种高效且长度自适应的全能扩散模型
LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model
March 1, 2026
作者: Zebin You, Xiaolu Zhang, Jun Zhou, Chongxuan Li, Ji-Rong Wen
cs.AI
摘要
我们提出LLaDA-o,一种高效且长度自适应的全能扩散模型,用于多模态理解与生成。该模型基于混合扩散框架构建,通过离散掩码扩散实现文本理解,连续扩散完成视觉生成,并借助共享的轻量化注意力骨干网络将二者耦合,有效减少固定条件下的冗余计算。在混合扩散框架基础上,我们进一步提出以数据为中心的长度自适应策略,无需调整架构即可实现多模态场景下的灵活长度解码。大量实验表明,LLaDA-o在多模态理解与生成基准测试中达到全能扩散模型的领先水平,在文本到图像生成的DPG-Bench基准上取得87.04分,验证了统一化全能扩散建模的有效性。代码已开源:https://github.com/ML-GSAI/LLaDA-o。
English
We present LLaDA-o, an effective and length-adaptive omni diffusion model for multimodal understanding and generation. LLaDA-o is built on a Mixture of Diffusion (MoD) framework that decouples discrete masked diffusion for text understanding and continuous diffusion for visual generation, while coupling them through a shared, simple, and efficient attention backbone that reduces redundant computation for fixed conditions. Building on MoD, we further introduce a data-centric length adaptation strategy that enables flexible-length decoding in multimodal settings without architectural changes. Extensive experiments show that LLaDA-o achieves state-of-the-art performance among omni-diffusion models on multimodal understanding and generation benchmarks, and reaches 87.04 on DPG-Bench for text-to-image generation, supporting the effectiveness of unified omni diffusion modeling. Code is available at https://github.com/ML-GSAI/LLaDA-o.