使用扩散模型在图像之间进行插值

摘要

图像生成和编辑中一个鲜为人知的领域是在两个输入图像之间进行插值，这是当前所有部署的图像生成流程中缺失的一个特性。我们认为这样的特性可以扩展这些模型的创意应用，并提出了一种使用潜在扩散模型进行零样本插值的方法。我们在潜在空间中应用插值，在一系列逐渐降低的噪声水平上进行，然后执行以从文本反演和（可选）主体姿势导出的插值文本嵌入为条件的去噪操作。为了获得更大的一致性，或者指定额外的标准，我们可以生成几个候选项，并使用CLIP选择最高质量的图像。我们获得了跨不同主体姿势、图像风格和图像内容的令人信服的插值，并展示了标准的定量指标如FID无法衡量插值的质量。代码和数据可在https://clintonjwang.github.io/interpolation获取。

English

One little-explored frontier of image generation and editing is the task of interpolating between two input images, a feature missing from all currently deployed image generation pipelines. We argue that such a feature can expand the creative applications of such models, and propose a method for zero-shot interpolation using latent diffusion models. We apply interpolation in the latent space at a sequence of decreasing noise levels, then perform denoising conditioned on interpolated text embeddings derived from textual inversion and (optionally) subject poses. For greater consistency, or to specify additional criteria, we can generate several candidates and use CLIP to select the highest quality image. We obtain convincing interpolations across diverse subject poses, image styles, and image content, and show that standard quantitative metrics such as FID are insufficient to measure the quality of an interpolation. Code and data are available at https://clintonjwang.github.io/interpolation.

使用扩散模型在图像之间进行插值

Interpolating between Images with Diffusion Models

摘要

Support