HumanRF: Hoogwaardige Neural Radiance Velden voor Mensen in Beweging

Samenvatting

Het representeren van menselijke prestaties met hoge nauwkeurigheid is een essentieel bouwblok in diverse toepassingen, zoals filmproductie, computerspellen of videoconferenties. Om de kloof naar productieniveau te dichten, introduceren we HumanRF, een 4D dynamische neurale scène-representatie die het volledige lichaam in beweging vastlegt vanuit multi-view video-input, en weergave mogelijk maakt vanuit nieuwe, onbekende gezichtspunten. Onze nieuwe representatie fungeert als een dynamische video-codering die fijne details vastlegt bij hoge compressieverhoudingen door ruimte-tijd te factoriseren in een tijdelijke matrix-vector-decompositie. Hierdoor kunnen we temporeel coherente reconstructies van menselijke acteurs verkrijgen voor lange sequenties, terwijl we hoogresolutie details representeren, zelfs in de context van uitdagende bewegingen. Terwijl het meeste onderzoek zich richt op synthese bij resoluties van 4MP of lager, pakken we de uitdaging aan om te werken bij 12MP. Hiertoe introduceren we ActorsHQ, een nieuw multi-view dataset dat 12MP-beelden biedt van 160 camera's voor 16 sequenties met hoogwaardige, per-frame mesh-reconstructies. We demonstreren uitdagingen die ontstaan bij het gebruik van dergelijke hoogresolutiegegevens en laten zien dat onze nieuw geïntroduceerde HumanRF deze gegevens effectief benut, waardoor een significante stap wordt gezet naar productieniveau-kwaliteit bij novel view synthesis.

English

Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints. Our novel representation acts as a dynamic video encoding that captures fine details at high compression rates by factorizing space-time into a temporal matrix-vector decomposition. This allows us to obtain temporally coherent reconstructions of human actors for long sequences, while representing high-resolution details even in the context of challenging motion. While most research focuses on synthesizing at resolutions of 4MP or lower, we address the challenge of operating at 12MP. To this end, we introduce ActorsHQ, a novel multi-view dataset that provides 12MP footage from 160 cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions. We demonstrate challenges that emerge from using such high-resolution data and show that our newly introduced HumanRF effectively leverages this data, making a significant step towards production-level quality novel view synthesis.

HumanRF: Hoogwaardige Neural Radiance Velden voor Mensen in Beweging

HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion

Samenvatting

Support