视角融合:基于第一人称与第三人称视觉的跨视角协同智能研究综述
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision
June 6, 2025
作者: Yuping He, Yifei Huang, Guo Chen, Lidong Lu, Baoqi Pei, Jilan Xu, Tong Lu, Yoichi Sato
cs.AI
摘要
从自我中心(第一人称)和外部中心(第三人称)视角感知世界是人类认知的基础,这种双重视角使我们能够对动态环境产生丰富且互补的理解。近年来,让机器利用这两种视角的协同潜力已成为视频理解领域一个引人注目的研究方向。在本综述中,我们全面回顾了从外部中心和自我中心视角进行的视频理解研究。我们首先强调了整合自我中心与外部中心技术的实际应用,展望了它们跨领域合作的潜力。接着,我们确定了实现这些应用的关键研究任务。随后,我们将最新进展系统地归纳为三大研究方向:(1) 利用自我中心数据增强外部中心理解,(2) 运用外部中心数据提升自我中心分析,以及(3) 统一两种视角的联合学习框架。针对每个方向,我们分析了一系列相关任务及代表性工作。此外,我们讨论了支持双视角研究的基准数据集,评估了它们的范围、多样性和适用性。最后,我们探讨了当前研究的局限性,并提出了未来可能的研究方向。通过综合两种视角的见解,我们的目标是推动视频理解与人工智能的进步,使机器更接近人类感知世界的方式。相关工作的GitHub资源库可在https://github.com/ayiyayi/Awesome-Egocentric-and-Exocentric-Vision找到。
English
Perceiving the world from both egocentric (first-person) and exocentric
(third-person) perspectives is fundamental to human cognition, enabling rich
and complementary understanding of dynamic environments. In recent years,
allowing the machines to leverage the synergistic potential of these dual
perspectives has emerged as a compelling research direction in video
understanding. In this survey, we provide a comprehensive review of video
understanding from both exocentric and egocentric viewpoints. We begin by
highlighting the practical applications of integrating egocentric and
exocentric techniques, envisioning their potential collaboration across
domains. We then identify key research tasks to realize these applications.
Next, we systematically organize and review recent advancements into three main
research directions: (1) leveraging egocentric data to enhance exocentric
understanding, (2) utilizing exocentric data to improve egocentric analysis,
and (3) joint learning frameworks that unify both perspectives. For each
direction, we analyze a diverse set of tasks and relevant works. Additionally,
we discuss benchmark datasets that support research in both perspectives,
evaluating their scope, diversity, and applicability. Finally, we discuss
limitations in current works and propose promising future research directions.
By synthesizing insights from both perspectives, our goal is to inspire
advancements in video understanding and artificial intelligence, bringing
machines closer to perceiving the world in a human-like manner. A GitHub repo
of related works can be found at
https://github.com/ayiyayi/Awesome-Egocentric-and-Exocentric-Vision.Summary
AI-Generated Summary