人間のフィードバックを活用したビデオ生成の向上

要旨

ビデオ生成は、修正フロー技術を通じて重要な進展を遂げてきましたが、滑らかでない動きやビデオとプロンプトの整合性の問題が依然として残っています。本研究では、これらの問題を軽減し、ビデオ生成モデルを洗練させるために、人間のフィードバックを活用する体系的なパイプラインを開発します。具体的には、現代のビデオ生成モデルに焦点を当てた大規模な人間の選好データセットを構築し、複数の次元にわたるペアワイズな注釈を組み込みます。次に、マルチ次元ビデオ報酬モデルであるVideoRewardを導入し、注釈やさまざまな設計選択が報酬の効果にどのように影響するかを検討します。報酬を最大化する統一された強化学習の観点から、KL正則化を用いて、拡散モデルからこれを拡張したフローベースモデルのための3つの整列アルゴリズムを導入します。これには、2つのトレーニング時戦略が含まれます：フローの直接選好最適化（Flow-DPO）およびフローの報酬重み付き回帰（Flow-RWR）、および推論時の手法であるFlow-NRGがあり、これはノイズの多いビデオに報酬ガイダンスを直接適用します。実験結果は、VideoRewardが既存の報酬モデルを大幅に上回り、Flow-DPOがFlow-RWRおよび標準の教師あり微調整手法よりも優れたパフォーマンスを示すことを示しています。さらに、Flow-NRGは、ユーザーが推論中に複数の目的にカスタムウェイトを割り当てることを可能にし、個人のビデオ品質ニーズに対応します。プロジェクトページ：https://gongyeliu.github.io/videoalign.

English

Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models by extending those from diffusion models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and standard supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs. Project page: https://gongyeliu.github.io/videoalign.

人間のフィードバックを活用したビデオ生成の向上

Improving Video Generation with Human Feedback

要旨

Support