Gamba: Combineer Gaussisch Splatten met Mamba voor 3D-reconstructie vanuit één beeld

Samenvatting

We pakken de uitdaging aan om efficiënt een 3D-asset te reconstrueren vanuit een enkele afbeelding, gezien de groeiende vraag naar geautomatiseerde 3D-contentcreatiepijplijnen. Eerdere methoden zijn voornamelijk gebaseerd op Score Distillation Sampling (SDS) en Neural Radiance Fields (NeRF). Ondanks hun aanzienlijke succes, kampen deze benaderingen met praktische beperkingen vanwege langdurige optimalisatie en aanzienlijk geheugengebruik. In dit rapport introduceren we Gamba, een end-to-end geamortiseerd 3D-reconstructiemodel voor single-view afbeeldingen, met de nadruk op twee belangrijke inzichten: (1) 3D-representatie: het benutten van een groot aantal 3D-Gaussians voor een efficiënt 3D-Gaussian splatting-proces; (2) Backbone-ontwerp: het introduceren van een Mamba-gebaseerd sequentieel netwerk dat contextafhankelijk redeneren en lineaire schaalbaarheid met de sequentie (token) lengte mogelijk maakt, waardoor een aanzienlijk aantal Gaussians kan worden verwerkt. Gamba omvat belangrijke verbeteringen in gegevensvoorbewerking, regularisatieontwerp en trainingsmethodologieën. We hebben Gamba geëvalueerd tegen bestaande optimalisatiegebaseerde en feed-forward 3D-generatiebenaderingen met behulp van de real-world gescande OmniObject3D-dataset. Hier toont Gamba competitieve generatiecapaciteiten, zowel kwalitatief als kwantitatief, terwijl het opmerkelijke snelheid bereikt, ongeveer 0,6 seconde op een enkele NVIDIA A100 GPU.

English

We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU.

Gamba: Combineer Gaussisch Splatten met Mamba voor 3D-reconstructie vanuit één beeld

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

Samenvatting

Support