BEDLAM: Un Dataset Sintetico di Corpi che Espongono Movimenti Animati Dettagliati e Realistici

Abstract

Dimostriamo, per la prima volta, che le reti neurali addestrate esclusivamente su dati sintetici raggiungono un'accuratezza all'avanguardia nel problema della stima della posa e della forma umana 3D (HPS) a partire da immagini reali. I precedenti dataset sintetici sono stati di piccole dimensioni, irrealistici o privi di abbigliamento realistico. Raggiungere un sufficiente realismo non è banale e mostriamo come farlo per corpi interi in movimento. Nello specifico, il nostro dataset BEDLAM contiene video RGB monoculari con corpi 3D ground-truth in formato SMPL-X. Include una varietà di forme corporee, movimenti, tonalità della pelle, capelli e abbigliamento. L'abbigliamento è simulato in modo realistico sui corpi in movimento utilizzando una simulazione fisica commerciale degli indumenti. Renderizziamo un numero variabile di persone in scene realistiche con illuminazione e movimenti della camera diversificati. Addestriamo quindi vari regressori HPS utilizzando BEDLAM e raggiungiamo un'accuratezza all'avanguardia su benchmark di immagini reali nonostante l'addestramento con dati sintetici. Utilizziamo BEDLAM per ottenere approfondimenti su quali scelte di progettazione del modello siano importanti per l'accuratezza. Con buoni dati di addestramento sintetici, scopriamo che un metodo di base come HMR si avvicina all'accuratezza del metodo SOTA attuale (CLIFF). BEDLAM è utile per una varietà di attività e tutte le immagini, i corpi ground-truth, gli indumenti 3D, il codice di supporto e altro ancora sono disponibili per scopi di ricerca. Inoltre, forniamo informazioni dettagliate sulla nostra pipeline di generazione di dati sintetici, consentendo ad altri di generare i propri dataset. Visita la pagina del progetto: https://bedlam.is.tue.mpg.de/.

English

We show, for the first time, that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape (HPS) estimation from real images. Previous synthetic datasets have been small, unrealistic, or lacked realistic clothing. Achieving sufficient realism is non-trivial and we show how to do this for full bodies in motion. Specifically, our BEDLAM dataset contains monocular RGB videos with ground-truth 3D bodies in SMPL-X format. It includes a diversity of body shapes, motions, skin tones, hair, and clothing. The clothing is realistically simulated on the moving bodies using commercial clothing physics simulation. We render varying numbers of people in realistic scenes with varied lighting and camera motions. We then train various HPS regressors using BEDLAM and achieve state-of-the-art accuracy on real-image benchmarks despite training with synthetic data. We use BEDLAM to gain insights into what model design choices are important for accuracy. With good synthetic training data, we find that a basic method like HMR approaches the accuracy of the current SOTA method (CLIFF). BEDLAM is useful for a variety of tasks and all images, ground truth bodies, 3D clothing, support code, and more are available for research purposes. Additionally, we provide detailed information about our synthetic data generation pipeline, enabling others to generate their own datasets. See the project page: https://bedlam.is.tue.mpg.de/.

BEDLAM: Un Dataset Sintetico di Corpi che Espongono Movimenti Animati Dettagliati e Realistici

BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion

Abstract

Support