ChatPaper.aiChatPaper

免費實現萬物個性化:基於擴散變換器的技術

Personalize Anything for Free with Diffusion Transformer

March 16, 2025
作者: Haoran Feng, Zehuan Huang, Lin Li, Hairong Lv, Lu Sheng
cs.AI

摘要

個性化圖像生成旨在根據用戶指定的概念生成圖像,同時實現靈活的編輯。近期無需訓練的方法雖然比基於訓練的方法展現出更高的計算效率,但在身份保持、適用性以及與擴散變換器(DiTs)的兼容性方面仍存在挑戰。本文揭示了DiT尚未開發的潛力,即僅需將去噪標記替換為參考對象的標記,即可實現零樣本對象重建。這一簡單而有效的特徵注入技術解鎖了從個性化到圖像編輯的多樣化場景。基於這一觀察,我們提出了「Personalize Anything」,這是一個無需訓練的框架,通過以下方式在DiT中實現個性化圖像生成:1)時間步自適應標記替換,通過早期階段注入強化物體一致性,並通過後期階段正則化增強靈活性;2)補丁擾動策略以提升結構多樣性。我們的方法無縫支持佈局引導生成、多對象個性化以及遮罩控制編輯。評估結果顯示了在身份保持和多功能性方面的最先進性能。我們的工作為DiTs提供了新的見解,同時為高效個性化提供了一個實用範式。
English
Personalized image generation aims to produce images of user-specified concepts while enabling flexible editing. Recent training-free approaches, while exhibit higher computational efficiency than training-based methods, struggle with identity preservation, applicability, and compatibility with diffusion transformers (DiTs). In this paper, we uncover the untapped potential of DiT, where simply replacing denoising tokens with those of a reference subject achieves zero-shot subject reconstruction. This simple yet effective feature injection technique unlocks diverse scenarios, from personalization to image editing. Building upon this observation, we propose Personalize Anything, a training-free framework that achieves personalized image generation in DiT through: 1) timestep-adaptive token replacement that enforces subject consistency via early-stage injection and enhances flexibility through late-stage regularization, and 2) patch perturbation strategies to boost structural diversity. Our method seamlessly supports layout-guided generation, multi-subject personalization, and mask-controlled editing. Evaluations demonstrate state-of-the-art performance in identity preservation and versatility. Our work establishes new insights into DiTs while delivering a practical paradigm for efficient personalization.

Summary

AI-Generated Summary

PDF445March 18, 2025