一步到位:基于像素均值流的无隐变量图像生成
One-step Latent-free Image Generation with Pixel Mean Flows
January 29, 2026
作者: Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, Kaiming He
cs.AI
摘要
现代基于扩散/流模型的图像生成方法通常具备两个核心特征:(i)采用多步采样策略,(ii)在隐空间中进行操作。近期研究在各自领域均取得突破性进展,为无需隐空间的单步扩散/流模型开辟了道路。本研究朝着这一目标迈进,提出"像素均值流"(pMF)方法。我们的核心设计理念是将网络输出空间与损失空间分别进行建模:网络目标被设计在预设的低维图像流形上(即x预测),而损失函数则通过速度空间中的均值流来定义。我们引入了图像流形与平均速度场之间的简易转换机制。实验表明,pMF在ImageNet数据集上实现了无需隐空间的单步生成,在256×256分辨率(FID=2.22)和512×512分辨率(FID=2.48)均取得优异效果,填补了该领域的关键空白。我们期待此项研究能进一步拓展基于扩散/流的生成模型的边界。
English
Modern diffusion/flow-based models for image generation typically exhibit two core characteristics: (i) using multi-step sampling, and (ii) operating in a latent space. Recent advances have made encouraging progress on each aspect individually, paving the way toward one-step diffusion/flow without latents. In this work, we take a further step towards this goal and propose "pixel MeanFlow" (pMF). Our core guideline is to formulate the network output space and the loss space separately. The network target is designed to be on a presumed low-dimensional image manifold (i.e., x-prediction), while the loss is defined via MeanFlow in the velocity space. We introduce a simple transformation between the image manifold and the average velocity field. In experiments, pMF achieves strong results for one-step latent-free generation on ImageNet at 256x256 resolution (2.22 FID) and 512x512 resolution (2.48 FID), filling a key missing piece in this regime. We hope that our study will further advance the boundaries of diffusion/flow-based generative models.