MotionLM: 다중 에이전트 모션 예측을 언어 모델링으로 접근하기

초록

도로 상의 행위자(agent)들의 미래 행동을 신뢰성 있게 예측하는 것은 자율주행 차량의 안전한 경로 계획을 위한 핵심 요소입니다. 본 연구에서는 연속적인 궤적을 이산적인 모션 토큰(motion token)의 시퀀스로 표현하고, 다중 행위자 모션 예측을 이 도메인에서의 언어 모델링(language modeling) 작업으로 재구성합니다. 우리가 제안한 모델인 MotionLM은 다음과 같은 장점을 제공합니다: 첫째, 다중 모드 분포(multimodal distribution)를 학습하기 위해 앵커(anchor)나 명시적인 잠재 변수 최적화가 필요하지 않습니다. 대신, 시퀀스 토큰에 대한 평균 로그 확률을 최대화하는 단일 표준 언어 모델링 목표를 활용합니다. 둘째, 기존의 접근법과 달리 개별 행위자의 궤적을 먼저 생성한 후 상호작용 점수를 계산하는 사후 처리 휴리스틱(post-hoc interaction heuristic)을 우회합니다. MotionLM은 단일 자기회귀 디코딩(autoregressive decoding) 과정에서 상호작용하는 행위자들의 미래에 대한 결합 분포(joint distribution)를 직접 생성합니다. 또한, 모델의 순차적 인수분해(sequential factorization)는 시간적으로 인과적인 조건부 롤아웃(conditional rollout)을 가능하게 합니다. 제안된 접근법은 Waymo Open Motion Dataset에서 다중 행위자 모션 예측 분야의 최신 기술 수준(state-of-the-art)을 달성하며, 상호작용 챌린지 리더보드에서 1위를 기록했습니다.

English

Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model's sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.

MotionLM: 다중 에이전트 모션 예측을 언어 모델링으로 접근하기

MotionLM: Multi-Agent Motion Forecasting as Language Modeling

초록

Support