ChatPaper.aiChatPaper

VideoRoPE:什麼構成良好的影片旋轉位置嵌入?

VideoRoPE: What Makes for Good Video Rotary Position Embedding?

February 7, 2025
作者: Xilin Wei, Xiaoran Liu, Yuhang Zang, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Jian Tong, Haodong Duan, Qipeng Guo, Jiaqi Wang, Xipeng Qiu, Dahua Lin
cs.AI

摘要

儘管旋轉位置嵌入(RoPE)及其變體因其長範文本能力而被廣泛採用,但將一維 RoPE 擴展至具有複雜時空結構的視頻仍然是一個未解之謎。本研究首次引入了一項全面分析,確定了對於將 RoPE 適應視頻的有效性至關重要的四個關鍵特徵,這些特徵在先前的研究中尚未得到充分考慮。作為我們分析的一部分,我們引入了一個具有挑戰性的 V-NIAH-D(帶有干擾物的視覺針在乾草堆中)任務,該任務在 V-NIAH 中添加了定期干擾物。V-NIAH-D 任務表明,缺乏適當的時間維度分配的先前 RoPE 變體容易被干擾物誤導。基於我們的分析,我們引入了 VideoRoPE,其具有設計良好的三維結構,以保持時空關係。VideoRoPE 具有低頻率時間分配以減輕周期性振盪,對角佈局以保持空間對稱性,以及可調節的時間間距以解耦時間和空間索引。VideoRoPE 在各種下游任務中始終優於先前的 RoPE 變體,例如長視頻檢索、視頻理解和視頻幻覺。我們的代碼將可在以下鏈接找到:https://github.com/Wiselnn570/VideoRoPE{https://github.com/Wiselnn570/VideoRoPE}。
English
While Rotary Position Embedding (RoPE) and its variants are widely adopted for their long-context capabilities, the extension of the 1D RoPE to video, with its complex spatio-temporal structure, remains an open challenge. This work first introduces a comprehensive analysis that identifies four key characteristics essential for the effective adaptation of RoPE to video, which have not been fully considered in prior work. As part of our analysis, we introduce a challenging V-NIAH-D (Visual Needle-In-A-Haystack with Distractors) task, which adds periodic distractors into V-NIAH. The V-NIAH-D task demonstrates that previous RoPE variants, lacking appropriate temporal dimension allocation, are easily misled by distractors. Based on our analysis, we introduce VideoRoPE, with a 3D structure designed to preserve spatio-temporal relationships. VideoRoPE features low-frequency temporal allocation to mitigate periodic oscillations, a diagonal layout to maintain spatial symmetry, and adjustable temporal spacing to decouple temporal and spatial indexing. VideoRoPE consistently surpasses previous RoPE variants, across diverse downstream tasks such as long video retrieval, video understanding, and video hallucination. Our code will be available at https://github.com/Wiselnn570/VideoRoPE{https://github.com/Wiselnn570/VideoRoPE}.

Summary

AI-Generated Summary

PDF652February 10, 2025