CARFF：用于3D场景预测的条件自动编码辐射场

摘要

我们提出了CARFF：条件自编码辐射场，用于3D场景预测，这是一种根据过去观察（如2D自我中心图像）来预测未来3D场景的方法。我们的方法通过概率编码器将图像映射到可能的3D潜在场景配置分布，并通过时间预测假设场景的演变。我们的潜在场景表示条件全局神经辐射场（NeRF）来表示3D场景模型，从而实现可解释的预测和直接的下游应用。这种方法通过考虑环境状态和动态的不确定性复杂情景，扩展了以往的神经渲染工作。我们采用Pose-Conditional-VAE和NeRF的两阶段训练来学习3D表示。此外，我们利用混合密度网络自回归地预测潜在场景表示，作为部分可观测马尔可夫决策过程。我们通过在CARLA驾驶模拟器中展示了我们方法在现实场景中的效用，CARFF可用于在涉及视觉遮挡的复杂多智能体自动驾驶情景中实现高效的轨迹和应急规划。

English

We propose CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting, a method for predicting future 3D scenes given past observations, such as 2D ego-centric images. Our method maps an image to a distribution over plausible 3D latent scene configurations using a probabilistic encoder, and predicts the evolution of the hypothesized scenes through time. Our latent scene representation conditions a global Neural Radiance Field (NeRF) to represent a 3D scene model, which enables explainable predictions and straightforward downstream applications. This approach extends beyond previous neural rendering work by considering complex scenarios of uncertainty in environmental states and dynamics. We employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations. Additionally, we auto-regressively predict latent scene representations as a partially observable Markov decision process, utilizing a mixture density network. We demonstrate the utility of our method in realistic scenarios using the CARLA driving simulator, where CARFF can be used to enable efficient trajectory and contingency planning in complex multi-agent autonomous driving scenarios involving visual occlusions.