MVHumanNet：一个大规模的多视角日常穿衣人体数据集

摘要

在当今时代，大型语言模型和文本到图像模型的成功可以归因于大规模数据集的推动力。然而，在3D视觉领域，虽然在大规模合成和真实捕获的物体数据集（如Objaverse和MVImgNet）上取得了显著进展，但在以人为中心的任务领域，由于缺乏大规模人类数据集，类似水平的进展尚未观察到。由于获取大规模高质量3D人类数据存在重大挑战，现有的高保真3D人体捕获数据集仍然规模中等。为了弥合这一差距，我们提出了MVHumanNet，这是一个包含4,500个人类身份的多视角人体动作序列数据集。我们的工作主要集中在收集具有大量不同身份和日常服装的人类数据，使用多视角人体捕获系统，这有助于轻松扩展数据收集。我们的数据集包含9,000套日常服装、60,000个运动序列和6.45亿帧，具有广泛的注释，包括人体蒙版、摄像机参数、2D和3D关键点、SMPL/SMPLX参数以及相应的文本描述。为了探索MVHumanNet在各种2D和3D视觉任务中的潜力，我们进行了关于视角一致动作识别、人体NeRF重建、文本驱动的视角无约束人体图像生成，以及2D视角无约束人体图像和3D头像生成的试点研究。大量实验表明，MVHumanNet提供的规模带来了性能改进和有效应用。作为当前最大规模的3D人类数据集，我们希望MVHumanNet数据的发布和注释能够促进在规模上进一步创新3D以人为中心任务领域。

English

In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets. However, in the realm of 3D vision, while remarkable progress has been made with models trained on large-scale synthetic and real-captured object data like Objaverse and MVImgNet, a similar level of progress has not been observed in the domain of human-centric tasks partially due to the lack of a large-scale human dataset. Existing datasets of high-fidelity 3D human capture continue to be mid-sized due to the significant challenges in acquiring large-scale high-quality 3D human data. To bridge this gap, we present MVHumanNet, a dataset that comprises multi-view human action sequences of 4,500 human identities. The primary focus of our work is on collecting human data that features a large number of diverse identities and everyday clothing using a multi-view human capture system, which facilitates easily scalable data collection. Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million frames with extensive annotations, including human masks, camera parameters, 2D and 3D keypoints, SMPL/SMPLX parameters, and corresponding textual descriptions. To explore the potential of MVHumanNet in various 2D and 3D visual tasks, we conducted pilot studies on view-consistent action recognition, human NeRF reconstruction, text-driven view-unconstrained human image generation, as well as 2D view-unconstrained human image and 3D avatar generation. Extensive experiments demonstrate the performance improvements and effective applications enabled by the scale provided by MVHumanNet. As the current largest-scale 3D human dataset, we hope that the release of MVHumanNet data with annotations will foster further innovations in the domain of 3D human-centric tasks at scale.

MVHumanNet：一个大规模的多视角日常穿衣人体数据集

MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures

摘要

Support