InfiniHuman:具备精确控制的无限三维人体生成
InfiniHuman: Infinite 3D Human Creation with Precise Control
October 13, 2025
作者: Yuxuan Xue, Xianghui Xie, Margaret Kostyrko, Gerard Pons-Moll
cs.AI
摘要
生成真实且可控的3D人体化身是一项长期存在的挑战,尤其是在涵盖广泛属性范围时,如种族、年龄、服装风格及细致的体型特征。为训练生成模型而采集和标注大规模人体数据集成本高昂,且在规模和多样性上受限。本文探讨的核心问题是:能否通过提炼现有基础模型,生成理论上无限、标注丰富的3D人体数据?我们提出了InfiniHuman框架,该框架协同提炼这些模型,以最低成本生成标注丰富的人体数据,并具备理论上无限的扩展性。我们开发了InfiniHumanData,一个全自动流程,利用视觉-语言和图像生成模型创建大规模多模态数据集。用户研究表明,我们自动生成的身份与扫描渲染结果难以区分。InfiniHumanData包含111,000个身份,覆盖前所未有的多样性。每个身份均附有多粒度文本描述、多视角RGB图像、详细服装图像及SMPL体型参数。基于此数据集,我们提出了InfiniHumanGen,一个基于扩散的生成流程,可根据文本、体型和服装资源进行条件生成。InfiniHumanGen实现了快速、真实且精确可控的化身生成。大量实验证明,在视觉质量、生成速度及可控性方面,该方法显著优于现有最先进技术。我们的方法通过实用且经济的解决方案,实现了高质量、细粒度控制的化身生成,规模理论上无限。我们将公开自动数据生成流程、全面的InfiniHumanData数据集及InfiniHumanGen模型,访问地址为https://yuxuan-xue.com/infini-human。
English
Generating realistic and controllable 3D human avatars is a long-standing
challenge, particularly when covering broad attribute ranges such as ethnicity,
age, clothing styles, and detailed body shapes. Capturing and annotating
large-scale human datasets for training generative models is prohibitively
expensive and limited in scale and diversity. The central question we address
in this paper is: Can existing foundation models be distilled to generate
theoretically unbounded, richly annotated 3D human data? We introduce
InfiniHuman, a framework that synergistically distills these models to produce
richly annotated human data at minimal cost and with theoretically unlimited
scalability. We propose InfiniHumanData, a fully automatic pipeline that
leverages vision-language and image generation models to create a large-scale
multi-modal dataset. User study shows our automatically generated identities
are undistinguishable from scan renderings. InfiniHumanData contains 111K
identities spanning unprecedented diversity. Each identity is annotated with
multi-granularity text descriptions, multi-view RGB images, detailed clothing
images, and SMPL body-shape parameters. Building on this dataset, we propose
InfiniHumanGen, a diffusion-based generative pipeline conditioned on text, body
shape, and clothing assets. InfiniHumanGen enables fast, realistic, and
precisely controllable avatar generation. Extensive experiments demonstrate
significant improvements over state-of-the-art methods in visual quality,
generation speed, and controllability. Our approach enables high-quality avatar
generation with fine-grained control at effectively unbounded scale through a
practical and affordable solution. We will publicly release the automatic data
generation pipeline, the comprehensive InfiniHumanData dataset, and the
InfiniHumanGen models at https://yuxuan-xue.com/infini-human.