PIA：透過即插即用模組在文本到圖像模型中打造個性化圖像動畫製作者

摘要

最近個性化文本到圖像（T2I）模型的進步已經改變了內容創作，使非專家能夠生成具有獨特風格的令人驚嘆的圖像。儘管具有潛力，但通過文本將逼真的動作添加到這些個性化圖像中在保留獨特風格、高保真細節和實現文本控制的過程中面臨著重大挑戰。在本文中，我們提出了PIA，一個個性化圖像動畫生成器，在與條件圖像對齊、實現文本控制動作以及與各種個性化T2I模型兼容而無需特定調整方面表現出色。為了實現這些目標，PIA基於一個基本的T2I模型，配備了經過良好訓練的時間對齊層，從而實現了任何個性化T2I模型無縫轉換為圖像動畫模型。PIA的一個關鍵組件是引入條件模塊，該模塊利用條件幀和幀間親和力作為輸入，以在潛在空間中引導外觀信息轉移，以親和性提示指導個別幀合成。這種設計有助於減輕與外觀相關的圖像對齊挑戰，並且更加專注於與動作相關的引導對齊。

English

Recent advancements in personalized text-to-image (T2I) models have revolutionized content creation, empowering non-experts to generate stunning images with unique styles. While promising, adding realistic motions into these personalized images by text poses significant challenges in preserving distinct styles, high-fidelity details, and achieving motion controllability by text. In this paper, we present PIA, a Personalized Image Animator that excels in aligning with condition images, achieving motion controllability by text, and the compatibility with various personalized T2I models without specific tuning. To achieve these goals, PIA builds upon a base T2I model with well-trained temporal alignment layers, allowing for the seamless transformation of any personalized T2I model into an image animation model. A key component of PIA is the introduction of the condition module, which utilizes the condition frame and inter-frame affinity as input to transfer appearance information guided by the affinity hint for individual frame synthesis in the latent space. This design mitigates the challenges of appearance-related image alignment within and allows for a stronger focus on aligning with motion-related guidance.

PIA：透過即插即用模組在文本到圖像模型中打造個性化圖像動畫製作者

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

摘要

Support