弗兰肯运动:部件级人体运动生成与组合
FrankenMotion: Part-level Human Motion Generation and Composition
January 15, 2026
作者: Chuqiao Li, Xianghui Xie, Yong Cao, Andreas Geiger, Gerard Pons-Moll
cs.AI
摘要
近年来,基于文本提示的人体运动生成取得了显著进展。然而,由于缺乏细粒度的部位级运动标注,现有方法主要依赖序列级或动作级描述,这限制了对单个身体部位的可控性。本研究利用大语言模型的推理能力,构建了具有原子化、时序感知的部位级文本标注的高质量运动数据集。与先前仅提供固定时间段同步部位描述或仅包含全局序列标签的数据集不同,我们的数据集以精细时间分辨率捕捉异步且语义独立的部位运动。基于此数据集,我们提出了一种基于扩散模型的部位感知运动生成框架FrankenMotion,其中每个身体部位由其具有时序结构的文本提示独立引导。据我们所知,这是首个提供原子化时序感知部位级运动标注,并实现兼具空间(身体部位)与时间(原子动作)控制能力的运动生成模型的研究。实验表明,FrankenMotion在适配我们设定并重新训练的基线模型中表现最优,且能组合生成训练中未见的运动。我们的代码与数据集将在论文发表时公开。
English
Human motion generation from text prompts has made remarkable progress in recent years. However, existing methods primarily rely on either sequence-level or action-level descriptions due to the absence of fine-grained, part-level motion annotations. This limits their controllability over individual body parts. In this work, we construct a high-quality motion dataset with atomic, temporally-aware part-level text annotations, leveraging the reasoning capabilities of large language models (LLMs). Unlike prior datasets that either provide synchronized part captions with fixed time segments or rely solely on global sequence labels, our dataset captures asynchronous and semantically distinct part movements at fine temporal resolution. Based on this dataset, we introduce a diffusion-based part-aware motion generation framework, namely FrankenMotion, where each body part is guided by its own temporally-structured textual prompt. This is, to our knowledge, the first work to provide atomic, temporally-aware part-level motion annotations and have a model that allows motion generation with both spatial (body part) and temporal (atomic action) control. Experiments demonstrate that FrankenMotion outperforms all previous baseline models adapted and retrained for our setting, and our model can compose motions unseen during training. Our code and dataset will be publicly available upon publication.