用於人體圖像動畫的隱式偏好對齊

摘要

人類圖像動畫已取得顯著進展，然而由於手部動作自由度極高且運動複雜，生成高保真手部動作仍是長期挑戰。儘管人類反饋強化學習（特別是直接偏好優化）提供了潛在解決方案，但其需要構建嚴格的偏好配對數據。然而，針對動態手部區域整理此類配對數據成本高昂，且因逐幀不一致性而難以實際操作。本文提出隱式偏好對齊（Implicit Preference Alignment, IPA），一種無需配對偏好數據的數據高效後訓練框架。基於隱式獎勵最大化的理論基礎，IPA通過最大化自生成高品質樣本的似然性並懲罰偏離預訓練先驗的分佈，實現模型對齊。此外，我們引入手部感知局部優化機制，明確引導對齊過程聚焦於手部區域。實驗表明，本方法能有效實現偏好優化以提升手部生成品質，同時顯著降低偏好數據構建門檻。開源代碼已發布於 https://github.com/mdswyz/IPA。

English

Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, it necessitates the construction of strict preference pairs. However, curating such pairs for dynamic hand regions is prohibitively expensive and often impractical due to frame-wise inconsistencies. In this paper, we propose Implicit Preference Alignment (IPA), a data-efficient post-training framework that eliminates the need for paired preference data. Theoretically grounded in implicit reward maximization, IPA aligns the model by maximizing the likelihood of self-generated high-quality samples while penalizing deviations from the pretrained prior. Furthermore, we introduce a Hand-Aware Local Optimization mechanism to explicitly steer the alignment process toward hand regions. Experiments demonstrate that our method achieves effective preference optimization to enhance hand generation quality, while significantly lowering the barrier for constructing preference data. Codes are released at https://github.com/mdswyz/IPA

用於人體圖像動畫的隱式偏好對齊

Implicit Preference Alignment for Human Image Animation

摘要

Support