IPAdapter-Instruct：使用Instruct提示解決基於圖像條件的歧義

摘要

擴散模型不斷推動最先進的圖像生成技術，但這個過程很難以微妙的方式進行控制：實踐證明，文本提示無法準確描述圖像風格或細微結構細節（如臉部）。ControlNet和IPAdapter解決了這一不足，通過將生成過程條件化為圖像，但每個個別實例僅限於建模單個條件後驗：對於實際應用案例，在同一工作流程中需要多個不同的後驗時，訓練和使用多個適配器很繁瑣。我們提出了IPAdapter-Instruct，將自然圖像條件與“Instruct”提示相結合，以便在相同的條件圖像之間切換解釋：風格轉移、對象提取、兩者或其他什麼？IPAdapterInstruct有效地學習多個任務，與專用的每個任務模型相比，質量損失最小。

English

Diffusion models continuously push the boundary of state-of-the-art image generation, but the process is hard to control with any nuance: practice proves that textual prompts are inadequate for accurately describing image style or fine structural details (such as faces). ControlNet and IPAdapter address this shortcoming by conditioning the generative process on imagery instead, but each individual instance is limited to modeling a single conditional posterior: for practical use-cases, where multiple different posteriors are desired within the same workflow, training and using multiple adapters is cumbersome. We propose IPAdapter-Instruct, which combines natural-image conditioning with ``Instruct'' prompts to swap between interpretations for the same conditioning image: style transfer, object extraction, both, or something else still? IPAdapterInstruct efficiently learns multiple tasks with minimal loss in quality compared to dedicated per-task models.

IPAdapter-Instruct：使用Instruct提示解決基於圖像條件的歧義

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

摘要

Support