ChatPaper.aiChatPaper

RealDPO:真实与否,即为偏好

RealDPO: Real or Not Real, that is the Preference

October 16, 2025
作者: Guo Cheng, Danni Yang, Ziqi Huang, Jianlou Si, Chenyang Si, Ziwei Liu
cs.AI

摘要

视频生成模型近期在合成质量上取得了显著进展。然而,生成复杂动作仍是一个关键挑战,现有模型往往难以产生自然、流畅且上下文一致的运动。生成动作与真实世界动作之间的差距限制了其实际应用。为解决这一问题,我们提出了RealDPO,一种新颖的对齐范式,它利用真实世界数据作为偏好学习的正样本,从而实现更精确的动作合成。与传统的监督微调(SFT)相比,后者提供的纠正反馈有限,RealDPO则采用直接偏好优化(DPO)并结合定制损失函数,以增强动作的真实感。通过对比真实世界视频与模型错误输出,RealDPO实现了迭代自我校正,逐步提升动作质量。为支持复杂动作合成的训练后优化,我们提出了RealAction-5K,这是一个精心策划的高质量视频数据集,捕捉了人类日常活动中的丰富且精确的动作细节。大量实验表明,相较于最先进的模型及现有偏好优化技术,RealDPO在视频质量、文本对齐及动作真实感方面均有显著提升。
English
Video generative models have recently achieved notable advancements in synthesis quality. However, generating complex motions remains a critical challenge, as existing models often struggle to produce natural, smooth, and contextually consistent movements. This gap between generated and real-world motions limits their practical applicability. To address this issue, we introduce RealDPO, a novel alignment paradigm that leverages real-world data as positive samples for preference learning, enabling more accurate motion synthesis. Unlike traditional supervised fine-tuning (SFT), which offers limited corrective feedback, RealDPO employs Direct Preference Optimization (DPO) with a tailored loss function to enhance motion realism. By contrasting real-world videos with erroneous model outputs, RealDPO enables iterative self-correction, progressively refining motion quality. To support post-training in complex motion synthesis, we propose RealAction-5K, a curated dataset of high-quality videos capturing human daily activities with rich and precise motion details. Extensive experiments demonstrate that RealDPO significantly improves video quality, text alignment, and motion realism compared to state-of-the-art models and existing preference optimization techniques.
PDF62October 17, 2025