BEDLAM: 詳細でリアルなアニメーション動作を示す身体の合成データセット

要旨

我々は、合成データのみで訓練されたニューラルネットワークが、実画像からの3D人体姿勢・形状（HPS）推定という課題において、初めて最先端の精度を達成することを示します。従来の合成データセットは、規模が小さかったり、非現実的であったり、現実的な衣服が欠けていました。十分なリアリズムを達成することは容易ではなく、我々は動く全身に対してこれをどのように実現するかを示します。具体的には、我々のBEDLAMデータセットには、SMPL-X形式のグラウンドトゥルース3Dボディを含む単眼RGBビデオが含まれています。これには、多様な体型、動き、肌の色、髪型、衣服が含まれています。衣服は、商用の衣服物理シミュレーションを使用して、動く身体にリアルにシミュレートされています。我々は、リアルなシーンで様々な照明やカメラの動きを用いて、異なる人数をレンダリングします。その後、BEDLAMを使用して様々なHPS回帰モデルを訓練し、合成データで訓練したにもかかわらず、実画像ベンチマークで最先端の精度を達成します。我々はBEDLAMを使用して、精度にとって重要なモデル設計の選択肢について洞察を得ます。良い合成訓練データを用いることで、HMRのような基本的な手法が、現在のSOTA手法（CLIFF）の精度に近づくことがわかります。BEDLAMは様々なタスクに有用であり、すべての画像、グラウンドトゥルースボディ、3D衣服、サポートコードなどが研究目的で利用可能です。さらに、我々は合成データ生成パイプラインに関する詳細な情報を提供し、他の研究者が独自のデータセットを生成できるようにします。プロジェクトページを参照してください: https://bedlam.is.tue.mpg.de/。

English

We show, for the first time, that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape (HPS) estimation from real images. Previous synthetic datasets have been small, unrealistic, or lacked realistic clothing. Achieving sufficient realism is non-trivial and we show how to do this for full bodies in motion. Specifically, our BEDLAM dataset contains monocular RGB videos with ground-truth 3D bodies in SMPL-X format. It includes a diversity of body shapes, motions, skin tones, hair, and clothing. The clothing is realistically simulated on the moving bodies using commercial clothing physics simulation. We render varying numbers of people in realistic scenes with varied lighting and camera motions. We then train various HPS regressors using BEDLAM and achieve state-of-the-art accuracy on real-image benchmarks despite training with synthetic data. We use BEDLAM to gain insights into what model design choices are important for accuracy. With good synthetic training data, we find that a basic method like HMR approaches the accuracy of the current SOTA method (CLIFF). BEDLAM is useful for a variety of tasks and all images, ground truth bodies, 3D clothing, support code, and more are available for research purposes. Additionally, we provide detailed information about our synthetic data generation pipeline, enabling others to generate their own datasets. See the project page: https://bedlam.is.tue.mpg.de/.

BEDLAM: 詳細でリアルなアニメーション動作を示す身体の合成データセット

BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion

要旨

Support