通过奖励梯度进行视频传播对齐
Video Diffusion Alignment via Reward Gradients
July 11, 2024
作者: Mihir Prabhudesai, Russell Mendonca, Zheyang Qin, Katerina Fragkiadaki, Deepak Pathak
cs.AI
摘要
我们在构建基础视频扩散模型方面取得了重大进展。由于这些模型是使用大规模无监督数据进行训练的,因此将这些模型调整到特定下游任务变得至关重要。通过监督微调来调整这些模型需要收集视频目标数据集,这是具有挑战性和繁琐的。在这项工作中,我们利用通过对强大视觉判别模型顶部的偏好学习的预训练奖励模型来调整视频扩散模型。这些模型包含相对于生成的 RGB 像素的密集梯度信息,这对于在复杂的搜索空间(例如视频)中进行高效学习至关重要。我们展示了从这些奖励模型向视频扩散模型反向传播梯度可以实现视频扩散模型的计算和样本高效对齐。我们展示了跨多种奖励模型和视频扩散模型的结果,表明我们的方法在奖励查询和计算方面比先前无梯度方法学习效率更高。我们的代码、模型权重和更多可视化内容可在 https://vader-vid.github.io 获取。
English
We have made significant progress towards building foundational video
diffusion models. As these models are trained using large-scale unsupervised
data, it has become crucial to adapt these models to specific downstream tasks.
Adapting these models via supervised fine-tuning requires collecting target
datasets of videos, which is challenging and tedious. In this work, we utilize
pre-trained reward models that are learned via preferences on top of powerful
vision discriminative models to adapt video diffusion models. These models
contain dense gradient information with respect to generated RGB pixels, which
is critical to efficient learning in complex search spaces, such as videos. We
show that backpropagating gradients from these reward models to a video
diffusion model can allow for compute and sample efficient alignment of the
video diffusion model. We show results across a variety of reward models and
video diffusion models, demonstrating that our approach can learn much more
efficiently in terms of reward queries and computation than prior gradient-free
approaches. Our code, model weights,and more visualization are available at
https://vader-vid.github.io.Summary
AI-Generated Summary