ChatPaper.aiChatPaper

通過人類反饋來改善視頻生成

Improving Video Generation with Human Feedback

January 23, 2025
作者: Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang
cs.AI

摘要

通過糾正流技術,視頻生成取得了顯著進展,但問題如運動不流暢和視頻與提示之間的不對齊仍然存在。在這項工作中,我們開發了一個系統化流程,利用人類反饋來減輕這些問題並改進視頻生成模型。具體而言,我們首先構建了一個大規模的人類偏好數據集,專注於現代視頻生成模型,並納入跨多維度的成對標註。然後,我們引入了VideoReward,一個多維視頻獎勵模型,並研究標註和各種設計選擇如何影響其獎勵效果。從統一的強化學習角度出發,旨在通過KL正則化來最大化獎勵,我們通過擴展擴散模型中的算法,引入了三種基於流模型的對齊算法。這些包括兩種訓練時策略:直接偏好優化流(Flow-DPO)和獎勵加權回歸流(Flow-RWR),以及一種推理時技術,Flow-NRG,它將獎勵指導直接應用於嘈雜的視頻。實驗結果表明,VideoReward明顯優於現有的獎勵模型,而Flow-DPO相較於Flow-RWR和標準監督微調方法表現更優。此外,Flow-NRG允許用戶在推理過程中為多個目標分配自定義權重,滿足個性化視頻質量需求。項目頁面:https://gongyeliu.github.io/videoalign。
English
Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist. In this work, we develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model. Specifically, we begin by constructing a large-scale human preference dataset focused on modern video generation models, incorporating pairwise annotations across multi-dimensions. We then introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy. From a unified reinforcement learning perspective aimed at maximizing reward with KL regularization, we introduce three alignment algorithms for flow-based models by extending those from diffusion models. These include two training-time strategies: direct preference optimization for flow (Flow-DPO) and reward weighted regression for flow (Flow-RWR), and an inference-time technique, Flow-NRG, which applies reward guidance directly to noisy videos. Experimental results indicate that VideoReward significantly outperforms existing reward models, and Flow-DPO demonstrates superior performance compared to both Flow-RWR and standard supervised fine-tuning methods. Additionally, Flow-NRG lets users assign custom weights to multiple objectives during inference, meeting personalized video quality needs. Project page: https://gongyeliu.github.io/videoalign.

Summary

AI-Generated Summary

PDF504January 24, 2025