Splatter Image: Ricostruzione 3D Ultra-Rapida da Singola Vista

Abstract

Introduciamo Splatter Image, un approccio ultra-veloce per la ricostruzione 3D di oggetti da immagini monoculari che opera a 38 FPS. Splatter Image si basa su Gaussian Splatting, che ha recentemente portato rendering in tempo reale, addestramento rapido e un'eccellente scalabilità alla ricostruzione multi-vista. Per la prima volta, applichiamo Gaussian Splatting in un contesto di ricostruzione monoculare. Il nostro approccio è basato sull'apprendimento e, al momento del test, la ricostruzione richiede solo la valutazione in feed-forward di una rete neurale. L'innovazione principale di Splatter Image è il design sorprendentemente semplice: utilizza una rete immagine-immagine 2D per mappare l'immagine di input a una Gaussiana 3D per pixel. Le Gaussiane risultanti hanno quindi la forma di un'immagine, la Splatter Image. Estendiamo ulteriormente il metodo per incorporare più di un'immagine come input, cosa che facciamo aggiungendo un meccanismo di attenzione cross-view. Grazie alla velocità del renderer (588 FPS), possiamo utilizzare una singola GPU per l'addestramento mentre generiamo intere immagini a ogni iterazione per ottimizzare metriche percettive come LPIPS. Su benchmark standard, dimostriamo non solo una ricostruzione veloce ma anche risultati migliori rispetto a baseline recenti e molto più costose in termini di PSNR, LPIPS e altre metriche.

English

We introduce the Splatter Image, an ultra-fast approach for monocular 3D object reconstruction which operates at 38 FPS. Splatter Image is based on Gaussian Splatting, which has recently brought real-time rendering, fast training, and excellent scaling to multi-view reconstruction. For the first time, we apply Gaussian Splatting in a monocular reconstruction setting. Our approach is learning-based, and, at test time, reconstruction only requires the feed-forward evaluation of a neural network. The main innovation of Splatter Image is the surprisingly straightforward design: it uses a 2D image-to-image network to map the input image to one 3D Gaussian per pixel. The resulting Gaussians thus have the form of an image, the Splatter Image. We further extend the method to incorporate more than one image as input, which we do by adding cross-view attention. Owning to the speed of the renderer (588 FPS), we can use a single GPU for training while generating entire images at each iteration in order to optimize perceptual metrics like LPIPS. On standard benchmarks, we demonstrate not only fast reconstruction but also better results than recent and much more expensive baselines in terms of PSNR, LPIPS, and other metrics.

Splatter Image: Ricostruzione 3D Ultra-Rapida da Singola Vista

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Abstract

Support