Ctrl-Crash：可控扩散模型实现逼真车辆碰撞模拟

摘要

近年来，视频扩散技术取得了显著进展；然而，由于大多数驾驶数据集中事故事件的稀缺性，这些技术在生成逼真的车祸场景图像方面仍面临挑战。提升交通安全需要真实且可控的事故模拟。为解决这一问题，我们提出了Ctrl-Crash，一种可控的车祸视频生成模型，该模型以边界框、碰撞类型及初始图像帧等信号为条件。我们的方法支持反事实场景生成，其中输入的微小变化可能导致截然不同的碰撞结果。为实现推理时的细粒度控制，我们采用无分类器引导策略，为每个条件信号独立调整尺度。与先前的扩散方法相比，Ctrl-Crash在定量视频质量指标（如FVD和JEDi）以及基于人类评估的物理真实感和视频质量定性测量上均达到了业界领先水平。

English

Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.