InstanceGen:基於實例級指令的圖像生成
InstanceGen: Image Generation with Instance-level Instructions
May 8, 2025
作者: Etai Sella, Yanir Kleiman, Hadar Averbuch-Elor
cs.AI
摘要
儘管生成模型的能力迅速提升,預訓練的文本到圖像模型在捕捉由複雜提示傳達的語義方面仍然存在困難,這些提示通常包含多個對象和實例級別的屬性。因此,我們看到越來越多的人對整合額外的結構約束感興趣,通常以粗略的邊界框形式出現,以更好地指導在這些具有挑戰性的情況下的生成過程。在這項工作中,我們將結構引導的想法更進一步,通過觀察到當代圖像生成模型可以直接提供一個合理的細粒度結構初始化。我們提出了一種技術,將這種基於圖像的結構引導與基於大語言模型(LLM)的實例級別指令相結合,從而生成符合文本提示所有部分的輸出圖像,包括對象數量、實例級別屬性以及實例之間的空間關係。
English
Despite rapid advancements in the capabilities of generative models,
pretrained text-to-image models still struggle in capturing the semantics
conveyed by complex prompts that compound multiple objects and instance-level
attributes. Consequently, we are witnessing growing interests in integrating
additional structural constraints, typically in the form of coarse bounding
boxes, to better guide the generation process in such challenging cases. In
this work, we take the idea of structural guidance a step further by making the
observation that contemporary image generation models can directly provide a
plausible fine-grained structural initialization. We propose a technique that
couples this image-based structural guidance with LLM-based instance-level
instructions, yielding output images that adhere to all parts of the text
prompt, including object counts, instance-level attributes, and spatial
relations between instances.Summary
AI-Generated Summary