MMHU：人間行動理解のための大規模マルチモーダルベンチマーク

要旨

人間は交通エコシステムの不可欠な構成要素であり、その行動を理解することは安全な運転システムの開発を促進する上で極めて重要です。近年の進展により、動き、軌跡、意図など人間の行動の様々な側面が探究されてきましたが、自動運転における人間の行動理解を評価するための包括的なベンチマークは未だ存在しません。本研究では、MMHUという大規模な人間行動分析ベンチマークを提案します。これは、人間の動きや軌跡、動きのテキスト記述、人間の意図、運転安全に関連する重要な行動ラベルなど、豊富なアノテーションを特徴としています。私たちのデータセットは、Waymoなどの既存の運転データセット、YouTubeの実世界動画、および独自に収集したデータなど、多様なソースから収集された57,000の人間の動画クリップと173万フレームで構成されています。人間の行動に関する詳細なキャプションを生成するため、人間参加型のアノテーションパイプラインを開発しました。データセットの詳細な分析を提供し、動きの予測から動きの生成、人間の行動に関する質問応答まで、幅広いタスクをベンチマークすることで、包括的な評価スイートを提供します。プロジェクトページ: https://MMHU-Benchmark.github.io

English

Humans are integral components of the transportation ecosystem, and understanding their behaviors is crucial to facilitating the development of safe driving systems. Although recent progress has explored various aspects of human behaviorx2014such as motion, trajectories, and intentionx2014a comprehensive benchmark for evaluating human behavior understanding in autonomous driving remains unavailable. In this work, we propose MMHU, a large-scale benchmark for human behavior analysis featuring rich annotations, such as human motion and trajectories, text description for human motions, human intention, and critical behavior labels relevant to driving safety. Our dataset encompasses 57k human motion clips and 1.73M frames gathered from diverse sources, including established driving datasets such as Waymo, in-the-wild videos from YouTube, and self-collected data. A human-in-the-loop annotation pipeline is developed to generate rich behavior captions. We provide a thorough dataset analysis and benchmark multiple tasksx2014ranging from motion prediction to motion generation and human behavior question answeringx2014thereby offering a broad evaluation suite. Project page : https://MMHU-Benchmark.github.io.

MMHU：人間行動理解のための大規模マルチモーダルベンチマーク

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

要旨

Support