シグナル時相論理を用いた多様な制御可能な拡散ポリシー

要旨

現実的なシミュレーションの生成は、自動運転や人間-ロボットインタラクションなどの自律システムアプリケーションにおいて極めて重要です。しかし、現在のドライビングシミュレータは、道路参加者の制御可能で多様かつルールに準拠した行動を生成するのに依然として困難を抱えています。ルールベースのモデルは多様な行動を生成できず、慎重な調整を必要とします。一方、学習ベースの手法はデータからポリシーを模倣しますが、明示的にルールに従うよう設計されていません。さらに、現実世界のデータセットは本質的に「単一結果」であるため、学習手法が多様な行動を生成するのが難しくなります。本論文では、Signal Temporal Logic（STL）とDiffusion Modelsを活用して、制御可能で多様かつルールを意識したポリシーを学習します。まず、現実世界のデータに基づいてSTLを較正し、次に軌道最適化を用いて多様な合成データを生成し、最後に拡張されたデータセット上で修正されたDiffusionポリシーを学習します。NuScenesデータセットでテストを行った結果、他のベースラインと比較して、最も多様でルールに準拠した軌道を達成でき、ランタイムは2番目に優れた手法の1/17倍でした。クローズドループテストでは、最高の多様性、ルール満足率、および最小の衝突率を達成しました。本手法は、テスト中に異なるSTLパラメータに基づいて多様な特性を生成できます。人間-ロボット遭遇シナリオのケーススタディでは、本手法が多様でオラクルに近い軌道を生成できることが示されました。アノテーションツール、拡張データセット、およびコードはhttps://github.com/mengyuest/pSTL-diffusion-policyで公開されています。

English

Generating realistic simulations is critical for autonomous system applications such as self-driving and human-robot interactions. However, driving simulators nowadays still have difficulty in generating controllable, diverse, and rule-compliant behaviors for road participants: Rule-based models cannot produce diverse behaviors and require careful tuning, whereas learning-based methods imitate the policy from data but are not designed to follow the rules explicitly. Besides, the real-world datasets are by nature "single-outcome", making the learning method hard to generate diverse behaviors. In this paper, we leverage Signal Temporal Logic (STL) and Diffusion Models to learn controllable, diverse, and rule-aware policy. We first calibrate the STL on the real-world data, then generate diverse synthetic data using trajectory optimization, and finally learn the rectified diffusion policy on the augmented dataset. We test on the NuScenes dataset and our approach can achieve the most diverse rule-compliant trajectories compared to other baselines, with a runtime 1/17X to the second-best approach. In the closed-loop testing, our approach reaches the highest diversity, rule satisfaction rate, and the least collision rate. Our method can generate varied characteristics conditional on different STL parameters in testing. A case study on human-robot encounter scenarios shows our approach can generate diverse and closed-to-oracle trajectories. The annotation tool, augmented dataset, and code are available at https://github.com/mengyuest/pSTL-diffusion-policy.

シグナル時相論理を用いた多様な制御可能な拡散ポリシー

Diverse Controllable Diffusion Policy with Signal Temporal Logic

要旨

Support