CARFF：條件自編碼輝度場用於3D場景預測

摘要

我們提出CARFF：條件自編碼輝度場，用於3D場景預測，這是一種根據過去觀察（例如2D自我中心圖像）來預測未來3D場景的方法。我們的方法通過概率編碼器將圖像映射到可能的3D潛在場景配置的分佈，並預測假設場景隨時間的演變。我們的潛在場景表示條件全局神經輝度場（NeRF）以表示3D場景模型，這有助於可解釋的預測和直觀的下游應用。這種方法擴展了以往的神經渲染工作，考慮了環境狀態和動態的不確定性。我們採用Pose-Conditional-VAE和NeRF的兩階段訓練來學習3D表示。此外，我們使用混合密度網絡，自回歸地預測潛在場景表示，作為部分可觀察馬爾可夫決策過程，以應對複雜的環境狀態和動態。我們通過CARLA駕駛模擬器在現實場景中展示了我們方法的效用，CARFF可用於在涉及視覺遮擋的複雜多智能體自動駕駛場景中實現高效的軌跡和應急計劃。

English

We propose CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting, a method for predicting future 3D scenes given past observations, such as 2D ego-centric images. Our method maps an image to a distribution over plausible 3D latent scene configurations using a probabilistic encoder, and predicts the evolution of the hypothesized scenes through time. Our latent scene representation conditions a global Neural Radiance Field (NeRF) to represent a 3D scene model, which enables explainable predictions and straightforward downstream applications. This approach extends beyond previous neural rendering work by considering complex scenarios of uncertainty in environmental states and dynamics. We employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations. Additionally, we auto-regressively predict latent scene representations as a partially observable Markov decision process, utilizing a mixture density network. We demonstrate the utility of our method in realistic scenarios using the CARLA driving simulator, where CARFF can be used to enable efficient trajectory and contingency planning in complex multi-agent autonomous driving scenarios involving visual occlusions.