PALP:文字生成影像模型的提示對齊個人化技術 (註:PALP為專有名詞保持原文,其餘採用符合AI領域術語的翻譯策略,強調「提示對齊」的技術特性與「個人化」的應用場景,符合學術論文標題簡潔精準的要求)
PALP: Prompt Aligned Personalization of Text-to-Image Models
January 11, 2024
作者: Moab Arar, Andrey Voynov, Amir Hertz, Omri Avrahami, Shlomi Fruchter, Yael Pritch, Daniel Cohen-Or, Ariel Shamir
cs.AI
摘要
內容創作者常希望突破傳統文生圖模型的能力限制,使用個人化主體來生成圖像。此外,他們可能還期望生成的圖像能包含特定地點、風格、氛圍等元素。現有的個性化方法往往需要在個性化能力與複雜文本提示的對齊度之間做出妥協,這種權衡可能影響用戶提示的實現效果與主體還原度。為解決此問題,我們提出一種專注於單一提示的個性化新方法,稱為「提示對齊個性化」。雖然看似限制較多,但本方法能顯著提升文本對齊效果,實現現有技術難以處理的複雜精細提示詞成像。具體而言,我們通過引入額外的分數蒸餾採樣項,使個性化模型始終與目標提示保持對齊。實驗證明本方法在單樣本與多樣本設定下均具備優越適應性,不僅能組合多個主體,還能從藝術品等參考圖像中汲取靈感。我們通過定量與定性分析,將本方法與現有基準線及尖端技術進行了對比驗證。
English
Content creators often aim to create personalized images using personal
subjects that go beyond the capabilities of conventional text-to-image models.
Additionally, they may want the resulting image to encompass a specific
location, style, ambiance, and more. Existing personalization methods may
compromise personalization ability or the alignment to complex textual prompts.
This trade-off can impede the fulfillment of user prompts and subject fidelity.
We propose a new approach focusing on personalization methods for a
single prompt to address this issue. We term our approach prompt-aligned
personalization. While this may seem restrictive, our method excels in
improving text alignment, enabling the creation of images with complex and
intricate prompts, which may pose a challenge for current techniques. In
particular, our method keeps the personalized model aligned with a target
prompt using an additional score distillation sampling term. We demonstrate the
versatility of our method in multi- and single-shot settings and further show
that it can compose multiple subjects or use inspiration from reference images,
such as artworks. We compare our approach quantitatively and qualitatively with
existing baselines and state-of-the-art techniques.