Ctrl-Crash：可控擴散技術實現逼真車禍場景

摘要

近年來，影片擴散技術取得了顯著進展；然而，由於大多數駕駛資料集中事故事件的稀缺性，這些技術在生成逼真的車禍影像方面仍面臨挑戰。提升交通安全需要真實且可控的事故模擬。為解決這一問題，我們提出了Ctrl-Crash，這是一個可控的車禍影片生成模型，它以邊界框、碰撞類型和初始影格等訊號為條件。我們的方法能夠生成反事實情境，其中輸入的微小變化可能導致截然不同的碰撞結果。為了在推理時實現細粒度控制，我們利用無分類器指導，並為每個條件訊號獨立調整尺度。與基於擴散的先前方法相比，Ctrl-Crash在定量影片質量指標（如FVD和JEDi）以及基於人類評估的物理真實性和影片質量方面，均達到了最先進的性能。

English

Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.

Ctrl-Crash：可控擴散技術實現逼真車禍場景

Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes

摘要

Support