拼合概念:基于部分的IP先验概念生成
Piece it Together: Part-Based Concepting with IP-Priors
March 13, 2025
作者: Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or
cs.AI
摘要
先進的生成模型在圖像合成方面表現卓越,但通常依賴於基於文本的條件設定。然而,視覺設計師的工作往往超越語言範疇,直接從現有的視覺元素中汲取靈感。在許多情況下,這些元素僅代表潛在概念的片段——例如一個獨特結構的翅膀,或一種特定的髮型——作為藝術家探索如何將它們創意性地融合成一個連貫整體的靈感來源。認識到這一需求,我們引入了一個生成框架,該框架無縫整合用戶提供的一組部分視覺組件,同時採樣生成一個合理且完整概念所需的缺失部分。我們的方法建立在一個強大且未被充分探索的表徵空間之上,該空間從IP-Adapter+中提取,在此基礎上我們訓練了IP-Prior,這是一個輕量級的流匹配模型,基於特定領域的先驗知識合成連貫的構圖,實現多樣化且上下文感知的生成。此外,我們提出了一種基於LoRA的微調策略,顯著提高了IP-Adapter+在特定任務中的提示遵循能力,解決了其在重建質量與提示遵循之間常見的權衡問題。
English
Advanced generative models excel at synthesizing images but often rely on
text-based conditioning. Visual designers, however, often work beyond language,
directly drawing inspiration from existing visual elements. In many cases,
these elements represent only fragments of a potential concept-such as an
uniquely structured wing, or a specific hairstyle-serving as inspiration for
the artist to explore how they can come together creatively into a coherent
whole. Recognizing this need, we introduce a generative framework that
seamlessly integrates a partial set of user-provided visual components into a
coherent composition while simultaneously sampling the missing parts needed to
generate a plausible and complete concept. Our approach builds on a strong and
underexplored representation space, extracted from IP-Adapter+, on which we
train IP-Prior, a lightweight flow-matching model that synthesizes coherent
compositions based on domain-specific priors, enabling diverse and
context-aware generations. Additionally, we present a LoRA-based fine-tuning
strategy that significantly improves prompt adherence in IP-Adapter+ for a
given task, addressing its common trade-off between reconstruction quality and
prompt adherence.Summary
AI-Generated Summary