신호 시간 논리(Signal Temporal Logic)를 활용한 다변화 가능한 제어 확산 정책

초록

자율 시스템 응용 분야, 예를 들어 자율 주행 및 인간-로봇 상호작용에서 현실적인 시뮬레이션을 생성하는 것은 매우 중요합니다. 그러나 현재의 주행 시뮬레이터는 도로 참여자들을 위한 제어 가능하고 다양하며 규칙을 준수하는 행동을 생성하는 데 어려움을 겪고 있습니다: 규칙 기반 모델은 다양한 행동을 생성하지 못하며 세심한 조정이 필요하고, 학습 기반 방법은 데이터에서 정책을 모방하지만 명시적으로 규칙을 따르도록 설계되지 않았습니다. 또한, 실제 세계의 데이터셋은 본질적으로 "단일 결과"를 가지기 때문에 학습 방법이 다양한 행동을 생성하기 어렵습니다. 본 논문에서는 Signal Temporal Logic (STL)과 Diffusion Models를 활용하여 제어 가능하고 다양하며 규칙을 인지하는 정책을 학습합니다. 먼저 실제 데이터에 대해 STL을 보정하고, 궤적 최적화를 사용하여 다양한 합성 데이터를 생성한 후, 증강된 데이터셋에서 수정된 확산 정책을 학습합니다. NuScenes 데이터셋에서 테스트한 결과, 우리의 접근 방식은 다른 베이스라인과 비교하여 가장 다양한 규칙 준수 궤적을 달성할 수 있었으며, 두 번째로 우수한 접근 방식 대비 1/17배의 실행 시간을 보였습니다. 폐루프 테스트에서 우리의 접근 방식은 가장 높은 다양성, 규칙 준수율, 그리고 가장 낮은 충돌율을 달성했습니다. 우리의 방법은 테스트에서 다양한 STL 매개변수에 조건부로 다양한 특성을 생성할 수 있습니다. 인간-로봇 조우 시나리오에 대한 사례 연구는 우리의 접근 방식이 다양하고 오라클에 가까운 궤적을 생성할 수 있음을 보여줍니다. 주석 도구, 증강된 데이터셋, 그리고 코드는 https://github.com/mengyuest/pSTL-diffusion-policy에서 이용 가능합니다.

English

Generating realistic simulations is critical for autonomous system applications such as self-driving and human-robot interactions. However, driving simulators nowadays still have difficulty in generating controllable, diverse, and rule-compliant behaviors for road participants: Rule-based models cannot produce diverse behaviors and require careful tuning, whereas learning-based methods imitate the policy from data but are not designed to follow the rules explicitly. Besides, the real-world datasets are by nature "single-outcome", making the learning method hard to generate diverse behaviors. In this paper, we leverage Signal Temporal Logic (STL) and Diffusion Models to learn controllable, diverse, and rule-aware policy. We first calibrate the STL on the real-world data, then generate diverse synthetic data using trajectory optimization, and finally learn the rectified diffusion policy on the augmented dataset. We test on the NuScenes dataset and our approach can achieve the most diverse rule-compliant trajectories compared to other baselines, with a runtime 1/17X to the second-best approach. In the closed-loop testing, our approach reaches the highest diversity, rule satisfaction rate, and the least collision rate. Our method can generate varied characteristics conditional on different STL parameters in testing. A case study on human-robot encounter scenarios shows our approach can generate diverse and closed-to-oracle trajectories. The annotation tool, augmented dataset, and code are available at https://github.com/mengyuest/pSTL-diffusion-policy.