MMHU：面向人类行为理解的大规模多模态基准

摘要

人类是交通生态系统中不可或缺的组成部分，理解其行为对于推动安全驾驶系统的发展至关重要。尽管近期研究已从多个维度探讨了人类行为——如动作、轨迹及意图——但在自动驾驶领域，评估人类行为理解的综合基准仍显缺失。本研究中，我们提出了MMHU，一个大规模的人类行为分析基准，它包含了丰富的标注信息，如人体运动与轨迹、运动文本描述、人类意图，以及与驾驶安全相关的关键行为标签。我们的数据集汇集了来自多元渠道的57,000段人体运动片段和173万帧图像，包括Waymo等知名驾驶数据集、YouTube上的实景视频以及自主采集的数据。我们开发了一套人机协作的标注流程，以生成详尽的行为描述。通过对数据集进行深入分析，并对从运动预测到运动生成及人类行为问答等多任务进行基准测试，我们提供了一个广泛的评估体系。项目页面：https://MMHU-Benchmark.github.io。

English

Humans are integral components of the transportation ecosystem, and understanding their behaviors is crucial to facilitating the development of safe driving systems. Although recent progress has explored various aspects of human behaviorx2014such as motion, trajectories, and intentionx2014a comprehensive benchmark for evaluating human behavior understanding in autonomous driving remains unavailable. In this work, we propose MMHU, a large-scale benchmark for human behavior analysis featuring rich annotations, such as human motion and trajectories, text description for human motions, human intention, and critical behavior labels relevant to driving safety. Our dataset encompasses 57k human motion clips and 1.73M frames gathered from diverse sources, including established driving datasets such as Waymo, in-the-wild videos from YouTube, and self-collected data. A human-in-the-loop annotation pipeline is developed to generate rich behavior captions. We provide a thorough dataset analysis and benchmark multiple tasksx2014ranging from motion prediction to motion generation and human behavior question answeringx2014thereby offering a broad evaluation suite. Project page : https://MMHU-Benchmark.github.io.

MMHU：面向人类行为理解的大规模多模态基准

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

摘要

Support