RealDPO：実在か非実在か、それが選好である

要旨

ビデオ生成モデルは近年、合成品質において顕著な進歩を遂げています。しかし、複雑な動きの生成は依然として重要な課題であり、既存のモデルは自然で滑らかかつ文脈に合致した動きを生成するのに苦戦しています。生成された動きと現実世界の動きとのこのギャップは、実用性を制限しています。この問題に対処するため、我々はRealDPOを提案します。これは、現実世界のデータを選好学習のための正例として活用する新しいアライメントパラダイムであり、より正確な動きの合成を可能にします。従来の教師あり微調整（SFT）が提供する限定的な修正フィードバックとは異なり、RealDPOはDirect Preference Optimization（DPO）を採用し、動きのリアリズムを向上させるための独自の損失関数を使用します。現実世界のビデオと誤ったモデル出力を対比させることで、RealDPOは反復的な自己修正を可能にし、動きの品質を段階的に向上させます。複雑な動きの合成におけるポストトレーニングを支援するため、我々はRealAction-5Kを提案します。これは、人間の日常活動を捉えた高品質なビデオのキュレーションデータセットであり、豊かで精密な動きの詳細を含んでいます。大規模な実験により、RealDPOが最先端のモデルや既存の選好最適化技術と比較して、ビデオ品質、テキストアライメント、および動きのリアリズムを大幅に向上させることが実証されています。

English

Video generative models have recently achieved notable advancements in synthesis quality. However, generating complex motions remains a critical challenge, as existing models often struggle to produce natural, smooth, and contextually consistent movements. This gap between generated and real-world motions limits their practical applicability. To address this issue, we introduce RealDPO, a novel alignment paradigm that leverages real-world data as positive samples for preference learning, enabling more accurate motion synthesis. Unlike traditional supervised fine-tuning (SFT), which offers limited corrective feedback, RealDPO employs Direct Preference Optimization (DPO) with a tailored loss function to enhance motion realism. By contrasting real-world videos with erroneous model outputs, RealDPO enables iterative self-correction, progressively refining motion quality. To support post-training in complex motion synthesis, we propose RealAction-5K, a curated dataset of high-quality videos capturing human daily activities with rich and precise motion details. Extensive experiments demonstrate that RealDPO significantly improves video quality, text alignment, and motion realism compared to state-of-the-art models and existing preference optimization techniques.

RealDPO：実在か非実在か、それが選好である

RealDPO: Real or Not Real, that is the Preference

要旨

Support