ChatPaper.aiChatPaper

EgoEdit:面向自我中心影片編輯的資料集、即時串流模型與基準測試框架

EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

December 5, 2025
作者: Runjia Li, Moayed Haji-Ali, Ashkan Mirzaei, Chaoyang Wang, Arpit Sahni, Ivan Skorokhodov, Aliaksandr Siarohin, Tomas Jakab, Junlin Han, Sergey Tulyakov, Philip Torr, Willi Menapace
cs.AI

摘要

我們研究以指令引導的自我中心視角影片編輯技術,專注於互動式擴增實境應用。儘管現有AI影片編輯器在第三人稱視角素材上表現良好,但自我中心視角存在獨特挑戰——包括快速的自身運動和頻繁的手物互動——這些因素造成了顯著的領域差異。此外,現有的離線編輯流程存在高延遲問題,限制了即時互動能力。為解決這些問題,我們提出了一套完整的自我中心影片編輯生態系統。首先,我們構建了EgoEditData——一個專為自我中心編輯場景精心設計並手動校準的數據集,其特點在於包含豐富的手物互動場景,同時明確保留手部資訊。其次,我們開發了EgoEdit這款遵循指令的自我中心影片編輯器,支援在單張GPU上進行即時串流推理。最後,我們推出EgoEditBench評估套件,針對指令遵循度、手部與互動保留效果,以及自身運動下的時間穩定性進行專門評估。無論在自我中心或通用編輯任務中,EgoEdit皆能產生具時間穩定性、嚴格遵循指令的結果,並保持互動級別的低延遲。它在現有方法表現欠佳的自我中心編輯基準測試中取得明顯提升,同時在通用編輯任務上保持與最強基線模型相當的性能。EgoEditData與EgoEditBench將公開提供給研究社群。詳情請參閱我們的網站:https://snap-research.github.io/EgoEdit
English
We study instruction-guided editing of egocentric videos for interactive AR applications. While recent AI video editors perform well on third-person footage, egocentric views present unique challenges - including rapid egomotion and frequent hand-object interactions - that create a significant domain gap. Moreover, existing offline editing pipelines suffer from high latency, limiting real-time interaction. To address these issues, we present a complete ecosystem for egocentric video editing. First, we construct EgoEditData, a carefully designed and manually curated dataset specifically designed for egocentric editing scenarios, featuring rich hand-object interactions, while explicitly preserving hands. Second, we develop EgoEdit, an instruction-following egocentric video editor that supports real-time streaming inference on a single GPU. Finally, we introduce EgoEditBench, an evaluation suite targeting instruction faithfulness, hand and interaction preservation, and temporal stability under egomotion. Across both egocentric and general editing tasks, EgoEdit produces temporally stable, instruction-faithful results with interactive latency. It achieves clear gains on egocentric editing benchmarks-where existing methods struggle-while maintaining performance comparable to the strongest baselines on general editing tasks. EgoEditData and EgoEditBench will be made public for the research community. See our website at https://snap-research.github.io/EgoEdit
PDF212December 10, 2025