ChatPaper.aiChatPaper

为自回归图像生成提炼语义感知的排序

Distilling semantically aware orders for autoregressive image generation

April 23, 2025
作者: Rishav Pramanik, Antoine Poupon, Juan A. Rodriguez, Masih Aminbeidokhti, David Vazquez, Christopher Pal, Zhaozheng Yin, Marco Pedersoli
cs.AI

摘要

基于自回归的块状图像生成技术近期在图像质量和可扩展性方面展现出竞争力,并能轻松集成并扩展至视觉-语言模型中。然而,自回归模型需要为图像块的生成设定一个明确的顺序。尽管在文本生成中,基于词语顺序的自然排列是合理的,但在图像生成中并不存在固有的生成顺序。传统上,自回归图像生成模型遵循光栅扫描顺序(即从左上到右下)。本文认为,这种顺序并非最优,因为它未能尊重图像内容的因果关系:例如,在基于日落视觉描述的条件下,自回归模型可能会先生成云朵再生成太阳,尽管云朵的颜色应取决于太阳的颜色而非相反。本研究中,我们首先通过训练一个模型以任意给定顺序生成图像块,从而在生成过程中推断每个图像块的内容及其位置(顺序)。其次,我们利用这些提取的顺序对任意顺序生成模型进行微调,以生成质量更优的图像。通过在两套数据集上的实验,我们证明这种新的生成方法相较于传统的光栅扫描方式能生成更高质量的图像,同时保持相似的训练成本且无需额外标注。
English
Autoregressive patch-based image generation has recently shown competitive results in terms of image quality and scalability. It can also be easily integrated and scaled within Vision-Language models. Nevertheless, autoregressive models require a defined order for patch generation. While a natural order based on the dictation of the words makes sense for text generation, there is no inherent generation order that exists for image generation. Traditionally, a raster-scan order (from top-left to bottom-right) guides autoregressive image generation models. In this paper, we argue that this order is suboptimal, as it fails to respect the causality of the image content: for instance, when conditioned on a visual description of a sunset, an autoregressive model may generate clouds before the sun, even though the color of clouds should depend on the color of the sun and not the inverse. In this work, we show that first by training a model to generate patches in any-given-order, we can infer both the content and the location (order) of each patch during generation. Secondly, we use these extracted orders to finetune the any-given-order model to produce better-quality images. Through our experiments, we show on two datasets that this new generation method produces better images than the traditional raster-scan approach, with similar training costs and no extra annotations.

Summary

AI-Generated Summary

PDF52April 25, 2025