MotionLM: マルチエージェントの動作予測を言語モデリングとして捉える

要旨

道路利用者の将来の行動を確実に予測することは、自動運転車両の安全な計画において重要な要素です。本論文では、連続的な軌跡を離散的なモーショントークンのシーケンスとして表現し、マルチエージェントの運動予測をこの領域における言語モデリングタスクとして定式化します。我々のモデルであるMotionLMは、いくつかの利点を提供します。第一に、マルチモーダル分布を学習するためにアンカーや明示的な潜在変数最適化を必要としません。代わりに、シーケンストークンに対する平均対数確率を最大化するという単一の標準的な言語モデリング目的関数を活用します。第二に、個々のエージェントの軌跡生成を行った後に相互作用を評価するといった事後的なヒューリスティックを回避します。その代わりに、MotionLMは単一の自己回帰デコードプロセスにおいて、相互作用するエージェントの将来に対する結合分布を生成します。さらに、モデルの逐次的な因数分解により、時間的に因果関係のある条件付きロールアウトが可能となります。提案手法は、Waymo Open Motion Datasetにおけるマルチエージェント運動予測の新たな最先端性能を確立し、インタラクティブチャレンジリーダーボードで1位を獲得しました。

English

Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model's sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.

MotionLM: マルチエージェントの動作予測を言語モデリングとして捉える

MotionLM: Multi-Agent Motion Forecasting as Language Modeling

要旨

Support