시각적 관점 간 협업 지능: 에고센트릭-엑소센트릭 비전을 중심으로 한 연구 동향 분석

초록

세상을 자아 중심적(1인칭) 및 타자 중심적(3인칭) 관점에서 인지하는 것은 인간 인지의 기본이며, 이를 통해 동적 환경에 대한 풍부하고 상호 보완적인 이해가 가능합니다. 최근 몇 년 동안, 기계가 이러한 이중 관점의 시너지 잠재력을 활용하도록 하는 것이 비디오 이해 분야에서 주목할 만한 연구 방향으로 떠오르고 있습니다. 본 조사에서는 타자 중심적 및 자아 중심적 관점에서의 비디오 이해에 대한 포괄적인 리뷰를 제공합니다. 먼저, 자아 중심적 및 타자 중심적 기술을 통합한 실제 응용 사례를 강조하며, 다양한 도메인에서의 잠재적 협력을 전망합니다. 그런 다음, 이러한 응용을 실현하기 위한 주요 연구 과제를 식별합니다. 다음으로, 최근의 발전을 세 가지 주요 연구 방향으로 체계적으로 정리하고 검토합니다: (1) 자아 중심적 데이터를 활용하여 타자 중심적 이해를 강화, (2) 타자 중심적 데이터를 활용하여 자아 중심적 분석을 개선, (3) 두 관점을 통합한 공동 학습 프레임워크. 각 방향에 대해 다양한 작업과 관련 연구를 분석합니다. 또한, 두 관점에서의 연구를 지원하는 벤치마크 데이터셋을 논의하며, 그 범위, 다양성 및 적용 가능성을 평가합니다. 마지막으로, 현재 연구의 한계를 논의하고 유망한 미래 연구 방향을 제안합니다. 두 관점에서의 통찰을 종합함으로써, 우리의 목표는 비디오 이해와 인공지능의 발전을 촉진하여 기계가 인간과 유사한 방식으로 세상을 인지하도록 하는 것입니다. 관련 연구의 GitHub 저장소는 https://github.com/ayiyayi/Awesome-Egocentric-and-Exocentric-Vision에서 확인할 수 있습니다.

English

Perceiving the world from both egocentric (first-person) and exocentric (third-person) perspectives is fundamental to human cognition, enabling rich and complementary understanding of dynamic environments. In recent years, allowing the machines to leverage the synergistic potential of these dual perspectives has emerged as a compelling research direction in video understanding. In this survey, we provide a comprehensive review of video understanding from both exocentric and egocentric viewpoints. We begin by highlighting the practical applications of integrating egocentric and exocentric techniques, envisioning their potential collaboration across domains. We then identify key research tasks to realize these applications. Next, we systematically organize and review recent advancements into three main research directions: (1) leveraging egocentric data to enhance exocentric understanding, (2) utilizing exocentric data to improve egocentric analysis, and (3) joint learning frameworks that unify both perspectives. For each direction, we analyze a diverse set of tasks and relevant works. Additionally, we discuss benchmark datasets that support research in both perspectives, evaluating their scope, diversity, and applicability. Finally, we discuss limitations in current works and propose promising future research directions. By synthesizing insights from both perspectives, our goal is to inspire advancements in video understanding and artificial intelligence, bringing machines closer to perceiving the world in a human-like manner. A GitHub repo of related works can be found at https://github.com/ayiyayi/Awesome-Egocentric-and-Exocentric-Vision.

시각적 관점 간 협업 지능: 에고센트릭-엑소센트릭 비전을 중심으로 한 연구 동향 분석

Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision

초록

Support