面向人物图像动画的隐式偏好对齐

摘要

人类图像动画技术取得了显著进展，但手部动作因其高自由度和运动复杂性，生成高保真手部运动仍是一项持续挑战。尽管基于人类反馈的强化学习（特别是直接偏好优化）提供了潜在解决方案，但该方法需要构建严格的偏好对。然而，由于动态手部区域存在帧间不一致性，为此类区域策划偏好对不仅成本高昂，往往还不切实际。本文提出隐式偏好对齐（IPA），一种无需配对偏好数据的数据高效后训练框架。从理论上基于隐式奖励最大化，IPA通过最大化自生成高质量样本的似然度，同时惩罚对预训练先验的偏离来实现模型对齐。此外，我们引入手部感知局部优化机制，将对齐过程明确引导至手部区域。实验表明，该方法能有效实现偏好优化以提升手部生成质量，同时显著降低偏好数据的构建门槛。代码已开源至https://github.com/mdswyz/IPA。

English

Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, it necessitates the construction of strict preference pairs. However, curating such pairs for dynamic hand regions is prohibitively expensive and often impractical due to frame-wise inconsistencies. In this paper, we propose Implicit Preference Alignment (IPA), a data-efficient post-training framework that eliminates the need for paired preference data. Theoretically grounded in implicit reward maximization, IPA aligns the model by maximizing the likelihood of self-generated high-quality samples while penalizing deviations from the pretrained prior. Furthermore, we introduce a Hand-Aware Local Optimization mechanism to explicitly steer the alignment process toward hand regions. Experiments demonstrate that our method achieves effective preference optimization to enhance hand generation quality, while significantly lowering the barrier for constructing preference data. Codes are released at https://github.com/mdswyz/IPA

面向人物图像动画的隐式偏好对齐

Implicit Preference Alignment for Human Image Animation

摘要

Support