人物画像アニメーションのための暗黙的嗜好調整

要旨

人間の画像アニメーションは顕著な進歩を遂げているが、高い自由度と動作の複雑さゆえに、高忠実度の手の動きを生成することは依然として困難な課題である。人間のフィードバックからの強化学習、特に直接的な選好最適化は有望な解決策を提供するが、厳密な選好ペアの構築が必要となる。しかし、動的な手領域に対してそのようなペアを収集することは、フレームごとの不一致により非常に高コストであり、現実的でないことが多い。本論文では、ペア化された選好データを不要とするデータ効率的なポストトレーニングフレームワークであるImplicit Preference Alignment (IPA)を提案する。IPAは暗黙の報酬最大化に理論的に基づいており、自己生成された高品質サンプルの尤度を最大化しつつ、事前学習済みの事前分布からの逸脱をペナルティすることでモデルを調整する。さらに、手領域にアライメントプロセスを明示的に導くためのHand-Aware Local Optimizationメカニズムを導入する。実験により、本手法が手の生成品質を向上させるための効果的な選好最適化を達成し、同時に選好データ構築の障壁を大幅に低減することを示す。コードはhttps://github.com/mdswyz/IPAで公開されている。

English

Human image animation has witnessed significant advancements, yet generating high-fidelity hand motions remains a persistent challenge due to their high degrees of freedom and motion complexity. While reinforcement learning from human feedback, particularly direct preference optimization, offers a potential solution, it necessitates the construction of strict preference pairs. However, curating such pairs for dynamic hand regions is prohibitively expensive and often impractical due to frame-wise inconsistencies. In this paper, we propose Implicit Preference Alignment (IPA), a data-efficient post-training framework that eliminates the need for paired preference data. Theoretically grounded in implicit reward maximization, IPA aligns the model by maximizing the likelihood of self-generated high-quality samples while penalizing deviations from the pretrained prior. Furthermore, we introduce a Hand-Aware Local Optimization mechanism to explicitly steer the alignment process toward hand regions. Experiments demonstrate that our method achieves effective preference optimization to enhance hand generation quality, while significantly lowering the barrier for constructing preference data. Codes are released at https://github.com/mdswyz/IPA

人物画像アニメーションのための暗黙的嗜好調整

Implicit Preference Alignment for Human Image Animation

要旨

Support