ChatPaper.aiChatPaper

多尺度局部推测解码在图像生成中的应用

Multi-Scale Local Speculative Decoding for Image Generation

January 8, 2026
作者: Elia Peruzzo, Guillaume Sautière, Amirhossein Habibian
cs.AI

摘要

自回归模型在图像合成领域取得了显著成功,但其序列化特性导致存在显著的延迟限制。推测解码技术为加速提供了可行路径,但现有方法受限于令牌级歧义和空间感知能力的缺失。本研究提出多尺度局部推测解码框架,通过结合多分辨率草案生成与空间感知验证机制,实现自回归图像生成的高效加速。该方法采用低分辨率草案生成器与可学习上采样器协同工作,提出候选图像令牌后由高分辨率目标模型进行并行验证。关键创新在于引入局部拒绝与重采样机制,通过在首次拒绝后聚焦空间邻域而非光栅扫描式重采样,实现草案错误的高效修正。实验表明,MuLo-SD可实现最高1.7倍的加速效果,在MS-COCO 5k验证集上经GenEval、DPG-Bench和FID/HPSv2评估,其加速性能超越EAGLE-2和LANTERN等强基线方法,同时保持相当的语义对齐度与感知质量。大量消融实验揭示了上采样设计、概率池化以及带邻域扩展的局部拒绝重采样机制的影响。本方法为图像合成领域的推测解码技术树立了新标杆,有效弥合了效率与保真度之间的鸿沟。
English
Autoregressive (AR) models have achieved remarkable success in image synthesis, yet their sequential nature imposes significant latency constraints. Speculative Decoding offers a promising avenue for acceleration, but existing approaches are limited by token-level ambiguity and lack of spatial awareness. In this work, we introduce Multi-Scale Local Speculative Decoding (MuLo-SD), a novel framework that combines multi-resolution drafting with spatially informed verification to accelerate AR image generation. Our method leverages a low-resolution drafter paired with learned up-samplers to propose candidate image tokens, which are then verified in parallel by a high-resolution target model. Crucially, we incorporate a local rejection and resampling mechanism, enabling efficient correction of draft errors by focusing on spatial neighborhoods rather than raster-scan resampling after the first rejection. We demonstrate that MuLo-SD achieves substantial speedups - up to 1.7times - outperforming strong speculative decoding baselines such as EAGLE-2 and LANTERN in terms of acceleration, while maintaining comparable semantic alignment and perceptual quality. These results are validated using GenEval, DPG-Bench, and FID/HPSv2 on the MS-COCO 5k validation split. Extensive ablations highlight the impact of up-sampling design, probability pooling, and local rejection and resampling with neighborhood expansion. Our approach sets a new state-of-the-art in speculative decoding for image synthesis, bridging the gap between efficiency and fidelity.
PDF12January 10, 2026