透過獎勵梯度進行影片傳播對齊
Video Diffusion Alignment via Reward Gradients
July 11, 2024
作者: Mihir Prabhudesai, Russell Mendonca, Zheyang Qin, Katerina Fragkiadaki, Deepak Pathak
cs.AI
摘要
我們在建立基礎視頻擴散模型方面取得了重大進展。由於這些模型是使用大規模非監督數據進行訓練的,因此將這些模型適應特定下游任務變得至關重要。通過監督微調來適應這些模型需要收集視頻目標數據集,這是具有挑戰性和繁瑣的。在這項工作中,我們利用通過偏好學習的預訓練獎勵模型,應用於強大的視覺判別模型之上,以適應視頻擴散模型。這些模型包含相對於生成的 RGB 像素的密集梯度信息,這對於在複雜的搜索空間(例如視頻)中進行高效學習至關重要。我們展示了從這些獎勵模型反向傳播梯度到視頻擴散模型可以實現計算和採樣高效對齊的結果。我們展示了在各種獎勵模型和視頻擴散模型上的結果,證明我們的方法在獎勵查詢和計算方面比先前無梯度方法學習得更有效。我們的代碼、模型權重和更多可視化內容可在 https://vader-vid.github.io 上找到。
English
We have made significant progress towards building foundational video
diffusion models. As these models are trained using large-scale unsupervised
data, it has become crucial to adapt these models to specific downstream tasks.
Adapting these models via supervised fine-tuning requires collecting target
datasets of videos, which is challenging and tedious. In this work, we utilize
pre-trained reward models that are learned via preferences on top of powerful
vision discriminative models to adapt video diffusion models. These models
contain dense gradient information with respect to generated RGB pixels, which
is critical to efficient learning in complex search spaces, such as videos. We
show that backpropagating gradients from these reward models to a video
diffusion model can allow for compute and sample efficient alignment of the
video diffusion model. We show results across a variety of reward models and
video diffusion models, demonstrating that our approach can learn much more
efficiently in terms of reward queries and computation than prior gradient-free
approaches. Our code, model weights,and more visualization are available at
https://vader-vid.github.io.Summary
AI-Generated Summary