ChatPaper.aiChatPaper

基于视觉语言模型引导的自适应负向提示创意生成

VLM-Guided Adaptive Negative Prompting for Creative Generation

October 12, 2025
作者: Shelly Golan, Yotam Nitzan, Zongze Wu, Or Patashnik
cs.AI

摘要

创意生成是指合成新颖、出人意料且具有价值的样本,这些样本虽反映用户意图,却无法预先构想。此任务旨在拓展人类想象力,探索存在于熟悉领域之间未知空间中的视觉概念。尽管文本到图像的扩散模型在渲染与用户提示高度匹配的逼真场景方面表现出色,但在生成真正新颖内容方面仍面临挑战。现有提升生成创造力的方法要么依赖于图像特征的插值,这限制了探索范围于预定义类别之内;要么需要耗时流程,如嵌入优化或模型微调。我们提出了一种无需训练、在推理阶段即可应用的“视觉语言模型引导的自适应负提示”方法,旨在促进创意图像生成的同时,确保生成对象的有效性。该方法利用视觉语言模型(VLM)分析生成过程中的中间输出,并自适应地引导其远离常规视觉概念,从而激发新颖且令人惊讶的输出。我们通过新颖性和有效性两个维度评估创造力,采用CLIP嵌入空间中的统计指标。大量实验表明,该方法在创意新颖性上持续提升,且计算开销微乎其微。此外,与现有主要生成单一对象的方法不同,我们的方法扩展至复杂场景,如生成一组连贯的创意对象,并在复杂的组合提示中保持创意。该方法无缝集成于现有扩散流程中,为超越文本描述限制的创意输出提供了一条实用路径。
English
Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance. This task aims to extend human imagination, enabling the discovery of visual concepts that exist in the unexplored spaces between familiar domains. While text-to-image diffusion models excel at rendering photorealistic scenes that faithfully match user prompts, they still struggle to generate genuinely novel content. Existing approaches to enhance generative creativity either rely on interpolation of image features, which restricts exploration to predefined categories, or require time-intensive procedures such as embedding optimization or model fine-tuning. We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation while preserving the validity of the generated object. Our approach utilizes a vision-language model (VLM) that analyzes intermediate outputs of the generation process and adaptively steers it away from conventional visual concepts, encouraging the emergence of novel and surprising outputs. We evaluate creativity through both novelty and validity, using statistical metrics in the CLIP embedding space. Through extensive experiments, we show consistent gains in creative novelty with negligible computational overhead. Moreover, unlike existing methods that primarily generate single objects, our approach extends to complex scenarios, such as generating coherent sets of creative objects and preserving creativity within elaborate compositional prompts. Our method integrates seamlessly into existing diffusion pipelines, offering a practical route to producing creative outputs that venture beyond the constraints of textual descriptions.
PDF32October 14, 2025