跨越視角:基於自我中心與他者中心視覺的跨視角協作智能綜述
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision
June 6, 2025
作者: Yuping He, Yifei Huang, Guo Chen, Lidong Lu, Baoqi Pei, Jilan Xu, Tong Lu, Yoichi Sato
cs.AI
摘要
從自我中心(第一人稱)與他者中心(第三人稱)雙重視角感知世界,是人類認知的基本能力,它賦予了我們對動態環境豐富且互補的理解。近年來,讓機器利用這兩種視角的協同潛力,已成為視頻理解領域一個引人注目的研究方向。本綜述全面回顧了從他者中心與自我中心視角出發的視頻理解研究。我們首先強調了整合自我中心與他者中心技術的實際應用,展望了它們跨領域合作的潛力。隨後,我們確定了實現這些應用的關鍵研究任務。接著,我們系統地梳理並將最新進展歸納為三大研究方向:(1)利用自我中心數據增強他者中心理解,(2)運用他者中心數據提升自我中心分析,以及(3)統一雙重視角的聯合學習框架。針對每個方向,我們分析了一系列多樣化的任務及相關工作。此外,我們討論了支持雙重視角研究的基準數據集,評估了它們的範圍、多樣性及適用性。最後,我們探討了當前工作的侷限性,並提出了未來研究的潛在方向。通過綜合雙重視角的洞見,我們旨在激發視頻理解與人工智能的進步,使機器更接近於以人類的方式感知世界。相關工作的GitHub倉庫可訪問:https://github.com/ayiyayi/Awesome-Egocentric-and-Exocentric-Vision。
English
Perceiving the world from both egocentric (first-person) and exocentric
(third-person) perspectives is fundamental to human cognition, enabling rich
and complementary understanding of dynamic environments. In recent years,
allowing the machines to leverage the synergistic potential of these dual
perspectives has emerged as a compelling research direction in video
understanding. In this survey, we provide a comprehensive review of video
understanding from both exocentric and egocentric viewpoints. We begin by
highlighting the practical applications of integrating egocentric and
exocentric techniques, envisioning their potential collaboration across
domains. We then identify key research tasks to realize these applications.
Next, we systematically organize and review recent advancements into three main
research directions: (1) leveraging egocentric data to enhance exocentric
understanding, (2) utilizing exocentric data to improve egocentric analysis,
and (3) joint learning frameworks that unify both perspectives. For each
direction, we analyze a diverse set of tasks and relevant works. Additionally,
we discuss benchmark datasets that support research in both perspectives,
evaluating their scope, diversity, and applicability. Finally, we discuss
limitations in current works and propose promising future research directions.
By synthesizing insights from both perspectives, our goal is to inspire
advancements in video understanding and artificial intelligence, bringing
machines closer to perceiving the world in a human-like manner. A GitHub repo
of related works can be found at
https://github.com/ayiyayi/Awesome-Egocentric-and-Exocentric-Vision.