RealDPO:真实与否,即为偏好
RealDPO: Real or Not Real, that is the Preference
October 16, 2025
作者: Guo Cheng, Danni Yang, Ziqi Huang, Jianlou Si, Chenyang Si, Ziwei Liu
cs.AI
摘要
视频生成模型近期在合成质量上取得了显著进展。然而,生成复杂动作仍是一个关键挑战,现有模型往往难以产生自然、流畅且上下文一致的运动。生成动作与真实世界动作之间的差距限制了其实际应用。为解决这一问题,我们提出了RealDPO,一种新颖的对齐范式,它利用真实世界数据作为偏好学习的正样本,从而实现更精确的动作合成。与传统的监督微调(SFT)相比,后者提供的纠正反馈有限,RealDPO则采用直接偏好优化(DPO)并结合定制损失函数,以增强动作的真实感。通过对比真实世界视频与模型错误输出,RealDPO实现了迭代自我校正,逐步提升动作质量。为支持复杂动作合成的训练后优化,我们提出了RealAction-5K,这是一个精心策划的高质量视频数据集,捕捉了人类日常活动中的丰富且精确的动作细节。大量实验表明,相较于最先进的模型及现有偏好优化技术,RealDPO在视频质量、文本对齐及动作真实感方面均有显著提升。
English
Video generative models have recently achieved notable advancements in
synthesis quality. However, generating complex motions remains a critical
challenge, as existing models often struggle to produce natural, smooth, and
contextually consistent movements. This gap between generated and real-world
motions limits their practical applicability. To address this issue, we
introduce RealDPO, a novel alignment paradigm that leverages real-world data as
positive samples for preference learning, enabling more accurate motion
synthesis. Unlike traditional supervised fine-tuning (SFT), which offers
limited corrective feedback, RealDPO employs Direct Preference Optimization
(DPO) with a tailored loss function to enhance motion realism. By contrasting
real-world videos with erroneous model outputs, RealDPO enables iterative
self-correction, progressively refining motion quality. To support
post-training in complex motion synthesis, we propose RealAction-5K, a curated
dataset of high-quality videos capturing human daily activities with rich and
precise motion details. Extensive experiments demonstrate that RealDPO
significantly improves video quality, text alignment, and motion realism
compared to state-of-the-art models and existing preference optimization
techniques.