SnapGen++:释放扩散变换器潜力,在边缘设备上实现高效高保真图像生成
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices
January 13, 2026
作者: Dongting Hu, Aarush Gupta, Magzhan Gabidolla, Arpit Sahni, Huseyin Coskun, Yanyu Li, Yerlan Idelbayev, Ahsan Mahmood, Aleksei Lebedev, Dishani Lahiri, Anujraaj Goyal, Ju Hu, Mingming Gong, Sergey Tulyakov, Anil Kag
cs.AI
摘要
扩散变换器(DiT)的最新进展虽为图像生成设立了新标准,但由于其高昂的计算与内存成本,仍难以在终端设备上实际部署。本研究提出一种面向移动及边缘设备的高效DiT框架,可在严格资源限制下实现变换器级别的生成质量。我们的设计融合三大核心要素:首先,提出具有自适应全局-局部稀疏注意力机制的紧凑型DiT架构,平衡全局上下文建模与局部细节保留;其次,构建弹性训练框架,在统一超网络内联合优化不同容量的子DiT模型,使单一模型能动态适配多样化硬件的高效推理需求;最后,开发知识引导分布匹配蒸馏技术,该分步蒸馏流程将DMD目标与少步数教师模型的知识迁移相结合,生成适用于实时终端应用的高保真低延迟结果(如4步生成)。这些创新共同构建了可扩展、高效率、高质量的扩散模型,为多样化硬件部署提供支持。
English
Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment due to their high computational and memory costs. In this work, we present an efficient DiT framework tailored for mobile and edge devices that achieves transformer-level generation quality under strict resource constraints. Our design combines three key components. First, we propose a compact DiT architecture with an adaptive global-local sparse attention mechanism that balances global context modeling and local detail preservation. Second, we propose an elastic training framework that jointly optimizes sub-DiTs of varying capacities within a unified supernetwork, allowing a single model to dynamically adjust for efficient inference across different hardware. Finally, we develop Knowledge-Guided Distribution Matching Distillation, a step-distillation pipeline that integrates the DMD objective with knowledge transfer from few-step teacher models, producing high-fidelity and low-latency generation (e.g., 4-step) suitable for real-time on-device use. Together, these contributions enable scalable, efficient, and high-quality diffusion models for deployment on diverse hardware.