Video-R1:强化多模态大语言模型中的视频推理能力
Video-R1: Reinforcing Video Reasoning in MLLMs
March 27, 2025
作者: Kaituo Feng, Kaixiong Gong, Bohao Li, Zonghao Guo, Yibing Wang, Tianshuo Peng, Benyou Wang, Xiangyu Yue
cs.AI
摘要
受DeepSeek-R1通过基于规则的强化学习(RL)成功激发推理能力的启发,我们推出了Video-R1,这是首次系统性地探索R1范式以激发多模态大语言模型(MLLMs)视频推理能力的尝试。然而,直接将GRPO算法的RL训练应用于视频推理面临两大挑战:一是缺乏对视频推理的时间建模,二是高质量视频推理数据的稀缺。为解决这些问题,我们首先提出了T-GRPO算法,该算法鼓励模型利用视频中的时间信息进行推理。此外,我们不仅依赖视频数据,还将高质量的图像推理数据纳入训练过程。我们构建了两个数据集:用于SFT冷启动的Video-R1-COT-165k和用于RL训练的Video-R1-260k,两者均包含图像和视频数据。实验结果表明,Video-R1在视频推理基准如VideoMMMU和VSI-Bench,以及包括MVBench和TempCompass等在内的通用视频基准上均取得了显著提升。值得注意的是,Video-R1-7B在视频空间推理基准VSI-bench上达到了35.8%的准确率,超越了商业专有模型GPT-4o。所有代码、模型和数据均已公开。
English
Inspired by DeepSeek-R1's success in eliciting reasoning abilities through
rule-based reinforcement learning (RL), we introduce Video-R1 as the first
attempt to systematically explore the R1 paradigm for eliciting video reasoning
within multimodal large language models (MLLMs). However, directly applying RL
training with the GRPO algorithm to video reasoning presents two primary
challenges: (i) a lack of temporal modeling for video reasoning, and (ii) the
scarcity of high-quality video-reasoning data. To address these issues, we
first propose the T-GRPO algorithm, which encourages models to utilize temporal
information in videos for reasoning. Additionally, instead of relying solely on
video data, we incorporate high-quality image-reasoning data into the training
process. We have constructed two datasets: Video-R1-COT-165k for SFT cold start
and Video-R1-260k for RL training, both comprising image and video data.
Experimental results demonstrate that Video-R1 achieves significant
improvements on video reasoning benchmarks such as VideoMMMU and VSI-Bench, as
well as on general video benchmarks including MVBench and TempCompass, etc.
Notably, Video-R1-7B attains a 35.8% accuracy on video spatial reasoning
benchmark VSI-bench, surpassing the commercial proprietary model GPT-4o. All
codes, models, data are released.Summary
AI-Generated Summary