SceNeRFlow: Ricostruzione Temporalmente Coerente di Scene Dinamiche Generiche

Abstract

I metodi esistenti per la ricostruzione 4D di oggetti generali e non rigidamente deformati si concentrano sulla sintesi di nuove visualizzazioni e trascurano le corrispondenze. Tuttavia, la coerenza temporale abilita compiti avanzati come l'editing 3D, l'analisi del movimento o la creazione di asset virtuali. Proponiamo SceNeRFlow per ricostruire una scena generale e non rigida in modo temporalmente coerente. Il nostro metodo dynamic-NeRF prende in input video RGB multi-vista e immagini di sfondo provenienti da telecamere statiche con parametri noti. Ricostruisce quindi le deformazioni di un modello canonico stimato della geometria e dell'aspetto in modalità online. Poiché questo modello canonico è invariante nel tempo, otteniamo corrispondenze anche per movimenti a lungo termine e su larga scala. Utilizziamo rappresentazioni neurali della scena per parametrizzare i componenti del nostro metodo. Come i precedenti metodi dynamic-NeRF, utilizziamo un modello di deformazione inversa. Abbiamo riscontrato la necessità di adattamenti non banali di questo modello per gestire movimenti più ampi: scomponiamo le deformazioni in una componente grossolana fortemente regolarizzata e una componente fine debolmente regolarizzata, dove la componente grossolana estende anche il campo di deformazione nello spazio circostante l'oggetto, consentendo il tracking nel tempo. Mostriamo sperimentalmente che, a differenza dei lavori precedenti che gestiscono solo piccoli movimenti, il nostro metodo consente la ricostruzione di movimenti su scala da studio.

English

Existing methods for the 4D reconstruction of general, non-rigidly deforming objects focus on novel-view synthesis and neglect correspondences. However, time consistency enables advanced downstream tasks like 3D editing, motion analysis, or virtual-asset creation. We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner. Our dynamic-NeRF method takes multi-view RGB videos and background images from static cameras with known camera parameters as input. It then reconstructs the deformations of an estimated canonical model of the geometry and appearance in an online fashion. Since this canonical model is time-invariant, we obtain correspondences even for long-term, long-range motions. We employ neural scene representations to parametrize the components of our method. Like prior dynamic-NeRF methods, we use a backwards deformation model. We find non-trivial adaptations of this model necessary to handle larger motions: We decompose the deformations into a strongly regularized coarse component and a weakly regularized fine component, where the coarse component also extends the deformation field into the space surrounding the object, which enables tracking over time. We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.

SceNeRFlow: Ricostruzione Temporalmente Coerente di Scene Dinamiche Generiche

SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes

Abstract

Support