ChatPaper.aiChatPaper

OmniAlpha:面向统一多任务RGBA生成的序列到序列框架

OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation

November 25, 2025
作者: Hao Yu, Jiabo Zhan, Zile Wang, Jinglin Wang, Huaisong Zhang, Hongyu Li, Xinrui Chen, Yongxian Wei, Chun Yuan
cs.AI

摘要

生成模型在RGB图像合成领域已取得显著成就,但实际应用往往需要RGBA格式的操控能力。这导致当前技术格局呈现碎片化:专业化的单任务模型虽能处理Alpha通道但缺乏通用性,而统一的多任务框架又受限于RGB领域。为弥合这一关键差距,我们提出OmniAlpha——首个面向序列到序列RGBA图像生成与编辑的统一多任务生成框架。其架构核心MSRoPE-BiL是一种新颖的RoPE方法,通过为扩散Transformer(DiT)主干网络引入双向可扩展的层轴,实现了对多输入/目标RGBA图层的并行处理。为支撑该框架,我们构建了AlphaLayers数据集,包含1000组通过新型自动化合成筛选流程制作的高质量多层三元组。基于该数据集对OmniAlpha进行21项多样化任务的联合训练,大量实验表明我们的统一方法在各项任务上均稳定超越专业基线模型。尤为突出的是,OmniAlpha在AIM-500数据集上实现无蒙版抠图的SAD指标相对降低84.8%,在图层条件补全任务中赢得超过90%的人类偏好评估。本研究证明统一的多任务模型能够学习到更优的RGBA共享表征,为开发更强大的图层感知生成系统开辟了新路径。
English
Generative models have excelled in RGB synthesis, but real-world applications require RGBA manipulation. This has led to a fragmented landscape: specialized, single-task models handle alpha but lack versatility, while unified multi-task frameworks are confined to the RGB domain. To bridge this critical gap, we propose OmniAlpha, the first unified, multi-task generative framework for sequence-to-sequence RGBA image generation and editing. Its architecture features MSRoPE-BiL, a novel RoPE method with a bi-directionally extendable layer axis for its Diffusion Transformer (DiT) backbone, enabling the concurrent processing of multiple input and target RGBA layers. To power this framework, we introduce AlphaLayers, a new dataset of 1,000 high-quality, multi-layer triplets, built via a novel automated synthesis and filter pipeline. Jointly training OmniAlpha on this dataset across a comprehensive suite of 21 diverse tasks, extensive experiments demonstrate that our unified approach consistently outperforms strong, specialized baselines. Most notably, OmniAlpha achieves a dramatic 84.8% relative reduction in SAD for mask-free matting on AIM-500 and wins over 90% of human preferences in layer-conditioned completion. Our work proves that a unified, multi-task model can learn a superior shared representation for RGBA, paving the way for more powerful, layer-aware generative systems.
PDF122December 1, 2025