ChatPaper.aiChatPaper

FRAP:具有自适应提示加权的忠实和逼真的文本到图像生成

FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting

August 21, 2024
作者: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohan Sai Singamsetti, Fengyu Sun, Wei Lu, Di Niu
cs.AI

摘要

文本到图像(T2I)扩散模型展示了在给定文本提示的情况下生成高质量图像的令人印象深刻的能力。然而,确保提示-图像对齐仍然是一个相当大的挑战,即生成与提示语义忠实对齐的图像。最近的研究尝试通过优化潜在代码来改善忠实性,这可能导致潜在代码超出分布范围,从而产生不现实的图像。在本文中,我们提出了FRAP,这是一种简单但有效的方法,基于自适应调整每个标记提示权重来改善提示-图像对齐和生成图像的真实性。我们设计了一种在线算法来自适应地更新每个标记的权重系数,通过最小化一个统一的目标函数来实现,该函数鼓励对象存在和对象-修饰符对的绑定。通过广泛的评估,我们展示了FRAP生成的图像与复杂数据集中的提示具有显著更高的提示-图像对齐度,同时与最近的潜在代码优化方法相比具有更低的平均延迟,例如在COCO-Subject数据集上比D&B快4秒。此外,通过视觉比较和在CLIP-IQA-Real指标上的评估,我们展示了FRAP不仅改善了提示-图像对齐,还生成了外观更真实的图像。我们还探讨了将FRAP与提示重写LLM相结合以恢复其降级的提示-图像对齐,观察到在提示-图像对齐和图像质量方面的改进。
English
Text-to-image (T2I) diffusion models have demonstrated impressive capabilities in generating high-quality images given a text prompt. However, ensuring the prompt-image alignment remains a considerable challenge, i.e., generating images that faithfully align with the prompt's semantics. Recent works attempt to improve the faithfulness by optimizing the latent code, which potentially could cause the latent code to go out-of-distribution and thus produce unrealistic images. In this paper, we propose FRAP, a simple, yet effective approach based on adaptively adjusting the per-token prompt weights to improve prompt-image alignment and authenticity of the generated images. We design an online algorithm to adaptively update each token's weight coefficient, which is achieved by minimizing a unified objective function that encourages object presence and the binding of object-modifier pairs. Through extensive evaluations, we show FRAP generates images with significantly higher prompt-image alignment to prompts from complex datasets, while having a lower average latency compared to recent latent code optimization methods, e.g., 4 seconds faster than D&B on the COCO-Subject dataset. Furthermore, through visual comparisons and evaluation on the CLIP-IQA-Real metric, we show that FRAP not only improves prompt-image alignment but also generates more authentic images with realistic appearances. We also explore combining FRAP with prompt rewriting LLM to recover their degraded prompt-image alignment, where we observe improvements in both prompt-image alignment and image quality.

Summary

AI-Generated Summary

PDF72November 16, 2024