Leren van Mens-Object Interactie voor 3D-Houdingsschatting van Mensen vanuit LiDAR-puntenwolken

Samenvatting

Het begrijpen van menselijke bewegingen op basis van LiDAR-puntsgegevens is een van de meest kritieke taken in autonoom rijden vanwege de directe relatie met de veiligheid van voetgangers. Toch blijft dit een uitdaging bij de aanwezigheid van diverse mens-objectinteracties en rommelige achtergronden. Desalniettemin zien bestaande methoden grotendeels de potentie over het hoofd om mens-objectinteracties te benutten voor het bouwen van robuuste 3D-menselijke houdingsschattingsframeworks. Er zijn twee belangrijke uitdagingen die de integratie van mens-objectinteractie motiveren. Ten eerste introduceren mens-objectinteracties ruimtelijke ambiguïteit tussen mens- en objectpunten, wat vaak leidt tot foutieve 3D-sleutelpuntvoorspellingen in interactiegebieden. Ten tweede bestaat er een ernstige klasse-onbalans in het aantal punten tussen interagerende en niet-interagerende lichaamsdelen, waarbij interactierijke regio's zoals handen en voeten schaars worden waargenomen in LiDAR-data. Om deze uitdagingen aan te pakken, stellen we een Human-Object Interaction Learning (HOIL)-framework voor voor robuuste 3D-menselijke houdingsschatting vanuit LiDAR-puntsgegevens. Om het ruimtelijke ambiguïteitsprobleem te mitigeren, presenteren we human-object interaction-aware contrastive learning (HOICL), dat de feature-discriminatie tussen mens- en objectpunten effectief verbetert, vooral in interactiegebieden. Om het klasse-onbalansprobleem te verlichten, introduceren we contact-aware part-guided pooling (CPPool), dat representatiecapaciteit adaptief herverdeelt door oververtegenwoordigde punten te comprimeren terwijl informatieve punten van interagerende lichaamsdelen behouden blijven. Daarnaast presenteren we een optionele op contact gebaseerde temporele verfijning die foutieve sleutelpuntenschattingen per frame verfijnt met behulp van contactaanwijzingen over tijd. Hierdoor benut ons HOIL-effectief mens-objectinteractie om ruimtelijke ambiguïteit en klasse-onbalans in interactiegebieden op te lossen. Code zal worden vrijgegeven.

English

Understanding humans from LiDAR point clouds is one of the most critical tasks in autonomous driving due to its close relationships with pedestrian safety, yet it remains challenging in the presence of diverse human-object interactions and cluttered backgrounds. Nevertheless, existing methods largely overlook the potential of leveraging human-object interactions to build robust 3D human pose estimation frameworks. There are two major challenges that motivate the incorporation of human-object interaction. First, human-object interactions introduce spatial ambiguity between human and object points, which often leads to erroneous 3D human keypoint predictions in interaction regions. Second, there exists severe class imbalance in the number of points between interacting and non-interacting body parts, with the interaction-frequent regions such as hand and foot being sparsely observed in LiDAR data. To address these challenges, we propose a Human-Object Interaction Learning (HOIL) framework for robust 3D human pose estimation from LiDAR point clouds. To mitigate the spatial ambiguity issue, we present human-object interaction-aware contrastive learning (HOICL) that effectively enhances feature discrimination between human and object points, particularly in interaction regions. To alleviate the class imbalance issue, we introduce contact-aware part-guided pooling (CPPool) that adaptively reallocates representational capacity by compressing overrepresented points while preserving informative points from interacting body parts. In addition, we present an optional contact-based temporal refinement that refines erroneous per-frame keypoint estimates using contact cues over time. As a result, our HOIL effectively leverages human-object interaction to resolve spatial ambiguity and class imbalance in interaction regions. Codes will be released.

Leren van Mens-Object Interactie voor 3D-Houdingsschatting van Mensen vanuit LiDAR-puntenwolken

Learning Human-Object Interaction for 3D Human Pose Estimation from LiDAR Point Clouds

Samenvatting

Support