LiDAR 포인트 클라우드에서 3D 인간 자세 추정을 위한 인간-객체 상호작용 학습

초록

라이더 포인트 클라우드로부터 인간을 이해하는 것은 보행자 안전과 밀접한 관련이 있어 자율 주행에서 가장 중요한 과제 중 하나이지만, 다양한 인간-객체 상호작용과 복잡한 배경으로 인해 여전히 어려운 문제로 남아 있습니다. 그럼에도 불구하고 기존 방법론은 강력한 3D 인간 자세 추정 프레임워크 구축을 위해 인간-객체 상호작용을 활용할 가능성을 크게 간과해 왔습니다. 인간-객체 상호작용을 통합해야 하는 주요 동기에는 두 가지 과제가 있습니다. 첫째, 인간-객체 상호작용은 인간과 객체 포인트 간의 공간적 모호성을 초래하며, 이는 상호작용 영역에서 잘못된 3D 인간 관절점 예측으로 이어지는 경우가 많습니다. 둘째, 상호작용하는 신체 부위와 그렇지 않은 부위 간 포인트 수의 심각한 클래스 불균형이 존재하며, 손과 발 같은 상호작용이 빈번한 영역은 라이더 데이터에서 희소하게 관측됩니다. 이러한 과제를 해결하기 위해 우리는 라이더 포인트 클라우드로부터 강건한 3D 인간 자세 추정을 위한 인간-객체 상호작용 학습(HOIL) 프레임워크를 제안합니다. 공간적 모호성 문제를 완화하기 위해, 특히 상호작용 영역에서 인간과 객체 포인트 간 특징 변별력을 효과적으로 향상시키는 인간-객체 상호작용 인식 대조 학습(HOICL)을 제시합니다. 클래스 불균형 문제를 완화하기 위해, 과다 표현된 포인트를 압축하면서 상호작용 신체 부위의 유익한 포인트는 보존함으로써 표현 능력을 적응적으로 재배분하는 접촉 인식 부위 기반 풀링(CPPool)을 도입합니다. 추가적으로, 시간에 따른 접촉 단서를 사용하여 프레임 단위의 오류가 있는 관절점 추정치를 정제하는 선택적인 접촉 기반 시간적 정제를 제시합니다. 그 결과, 우리의 HOIL은 인간-객체 상호작용을 효과적으로 활용하여 상호작용 영역의 공간적 모호성과 클래스 불균형을 해결합니다. 코드는 공개될 예정입니다.

English

Understanding humans from LiDAR point clouds is one of the most critical tasks in autonomous driving due to its close relationships with pedestrian safety, yet it remains challenging in the presence of diverse human-object interactions and cluttered backgrounds. Nevertheless, existing methods largely overlook the potential of leveraging human-object interactions to build robust 3D human pose estimation frameworks. There are two major challenges that motivate the incorporation of human-object interaction. First, human-object interactions introduce spatial ambiguity between human and object points, which often leads to erroneous 3D human keypoint predictions in interaction regions. Second, there exists severe class imbalance in the number of points between interacting and non-interacting body parts, with the interaction-frequent regions such as hand and foot being sparsely observed in LiDAR data. To address these challenges, we propose a Human-Object Interaction Learning (HOIL) framework for robust 3D human pose estimation from LiDAR point clouds. To mitigate the spatial ambiguity issue, we present human-object interaction-aware contrastive learning (HOICL) that effectively enhances feature discrimination between human and object points, particularly in interaction regions. To alleviate the class imbalance issue, we introduce contact-aware part-guided pooling (CPPool) that adaptively reallocates representational capacity by compressing overrepresented points while preserving informative points from interacting body parts. In addition, we present an optional contact-based temporal refinement that refines erroneous per-frame keypoint estimates using contact cues over time. As a result, our HOIL effectively leverages human-object interaction to resolve spatial ambiguity and class imbalance in interaction regions. Codes will be released.

LiDAR 포인트 클라우드에서 3D 인간 자세 추정을 위한 인간-객체 상호작용 학습

Learning Human-Object Interaction for 3D Human Pose Estimation from LiDAR Point Clouds

초록

Support