ChatPaper.aiChatPaper

長格式影片理解的時間偏好優化

Temporal Preference Optimization for Long-Form Video Understanding

January 23, 2025
作者: Rui Li, Xiaohan Wang, Yuhui Zhang, Zeyu Wang, Serena Yeung-Levy
cs.AI

摘要

儘管在視頻大型多模型(video-LMMs)方面取得了顯著進展,但對於長格式視頻中的有效時間基準仍然是現有模型面臨的挑戰。為了解決這一限制,我們提出了一種名為時間偏好優化(Temporal Preference Optimization,TPO)的新型後訓練框架,旨在通過偏好學習來增強視頻-LMMs的時間基準能力。TPO採用自我訓練方法,使模型能夠通過利用兩個細粒度的精心策劃的偏好數據集來區分基準良好和不太準確的時間響應:局部時間基準,專注於特定視頻片段,以及全面時間基準,捕捉整個視頻序列中的擴展時間依賴性。通過對這些偏好數據集進行優化,TPO顯著增強了時間理解能力,同時減少了對手動標註數據的依賴。在三個長格式視頻理解基準測試LongVideoBench、MLVU和Video-MME上進行了大量實驗,展示了TPO在兩個最先進的視頻-LMMs上的有效性。值得注意的是,LLaVA-Video-TPO在Video-MME基準測試中確立了自己作為領先的7B模型的地位,突顯了TPO作為一種可擴展且高效的解決方案,用於推進長格式視頻理解中的時間推理潛力。項目頁面:https://ruili33.github.io/tpo_website。
English
Despite significant advancements in video large multimodal models (video-LMMs), achieving effective temporal grounding in long-form videos remains a challenge for existing models. To address this limitation, we propose Temporal Preference Optimization (TPO), a novel post-training framework designed to enhance the temporal grounding capabilities of video-LMMs through preference learning. TPO adopts a self-training approach that enables models to differentiate between well-grounded and less accurate temporal responses by leveraging curated preference datasets at two granularities: localized temporal grounding, which focuses on specific video segments, and comprehensive temporal grounding, which captures extended temporal dependencies across entire video sequences. By optimizing on these preference datasets, TPO significantly enhances temporal understanding while reducing reliance on manually annotated data. Extensive experiments on three long-form video understanding benchmarks--LongVideoBench, MLVU, and Video-MME--demonstrate the effectiveness of TPO across two state-of-the-art video-LMMs. Notably, LLaVA-Video-TPO establishes itself as the leading 7B model on the Video-MME benchmark, underscoring the potential of TPO as a scalable and efficient solution for advancing temporal reasoning in long-form video understanding. Project page: https://ruili33.github.io/tpo_website.

Summary

AI-Generated Summary

PDF223January 24, 2025