ChatPaper.aiChatPaper

视角融合:基于第一人称与第三人称视觉的跨视角协同智能研究综述

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

June 6, 2025
作者: Yuping He, Yifei Huang, Guo Chen, Lidong Lu, Baoqi Pei, Jilan Xu, Tong Lu, Yoichi Sato
cs.AI

摘要

从自我中心(第一人称)和外部中心(第三人称)视角感知世界是人类认知的基础,这种双重视角使我们能够对动态环境产生丰富且互补的理解。近年来,让机器利用这两种视角的协同潜力已成为视频理解领域一个引人注目的研究方向。在本综述中,我们全面回顾了从外部中心和自我中心视角进行的视频理解研究。我们首先强调了整合自我中心与外部中心技术的实际应用,展望了它们跨领域合作的潜力。接着,我们确定了实现这些应用的关键研究任务。随后,我们将最新进展系统地归纳为三大研究方向:(1) 利用自我中心数据增强外部中心理解,(2) 运用外部中心数据提升自我中心分析,以及(3) 统一两种视角的联合学习框架。针对每个方向,我们分析了一系列相关任务及代表性工作。此外,我们讨论了支持双视角研究的基准数据集,评估了它们的范围、多样性和适用性。最后,我们探讨了当前研究的局限性,并提出了未来可能的研究方向。通过综合两种视角的见解,我们的目标是推动视频理解与人工智能的进步,使机器更接近人类感知世界的方式。相关工作的GitHub资源库可在https://github.com/ayiyayi/Awesome-Egocentric-and-Exocentric-Vision找到。
English
Perceiving the world from both egocentric (first-person) and exocentric (third-person) perspectives is fundamental to human cognition, enabling rich and complementary understanding of dynamic environments. In recent years, allowing the machines to leverage the synergistic potential of these dual perspectives has emerged as a compelling research direction in video understanding. In this survey, we provide a comprehensive review of video understanding from both exocentric and egocentric viewpoints. We begin by highlighting the practical applications of integrating egocentric and exocentric techniques, envisioning their potential collaboration across domains. We then identify key research tasks to realize these applications. Next, we systematically organize and review recent advancements into three main research directions: (1) leveraging egocentric data to enhance exocentric understanding, (2) utilizing exocentric data to improve egocentric analysis, and (3) joint learning frameworks that unify both perspectives. For each direction, we analyze a diverse set of tasks and relevant works. Additionally, we discuss benchmark datasets that support research in both perspectives, evaluating their scope, diversity, and applicability. Finally, we discuss limitations in current works and propose promising future research directions. By synthesizing insights from both perspectives, our goal is to inspire advancements in video understanding and artificial intelligence, bringing machines closer to perceiving the world in a human-like manner. A GitHub repo of related works can be found at https://github.com/ayiyayi/Awesome-Egocentric-and-Exocentric-Vision.

Summary

AI-Generated Summary

PDF61June 9, 2025