ChatPaper.aiChatPaper

一步式无隐空间图像生成:像素均值流方法

One-step Latent-free Image Generation with Pixel Mean Flows

January 29, 2026
作者: Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, Kaiming He
cs.AI

摘要

现代基于扩散/流模型的图像生成方法通常具备两个核心特征:(i)采用多步采样机制,(ii)在潜在空间中操作。近期研究在各自领域取得鼓舞人心的进展,为无需潜在空间的单步扩散/流模型铺平了道路。本研究朝着该目标迈出关键一步,提出"像素均值流"方法。我们的核心设计原则是分别构建网络输出空间与损失空间:网络目标被设计在预设的低维图像流形上(即x预测),而损失函数则通过速度空间的均值流定义。我们引入了图像流形与平均速度场之间的简易转换关系。实验表明,pMF在ImageNet数据集上实现了256×256分辨率(2.22 FID)和512×512分辨率(2.48 FID)的强效单步无潜在空间生成效果,填补了该领域的关键空白。期待本研究能进一步推动基于扩散/流模型的生成技术边界。
English
Modern diffusion/flow-based models for image generation typically exhibit two core characteristics: (i) using multi-step sampling, and (ii) operating in a latent space. Recent advances have made encouraging progress on each aspect individually, paving the way toward one-step diffusion/flow without latents. In this work, we take a further step towards this goal and propose "pixel MeanFlow" (pMF). Our core guideline is to formulate the network output space and the loss space separately. The network target is designed to be on a presumed low-dimensional image manifold (i.e., x-prediction), while the loss is defined via MeanFlow in the velocity space. We introduce a simple transformation between the image manifold and the average velocity field. In experiments, pMF achieves strong results for one-step latent-free generation on ImageNet at 256x256 resolution (2.22 FID) and 512x512 resolution (2.48 FID), filling a key missing piece in this regime. We hope that our study will further advance the boundaries of diffusion/flow-based generative models.
PDF63January 31, 2026