DrawMotion：通过手绘生成3D人体运动

摘要

文本到动作生成（Text-to-Motion Generation）旨在将文本描述转化为人体动作，其面临的一个核心挑战是用户难以仅通过文字精确表达其预期的动作。为解决这一问题，本文提出DrawMotion——一种面向多条件场景的高效扩散框架。该框架基于传统的文本条件与新颖的手绘条件共同生成动作，分别提供语义控制与空间控制。具体而言，我们从三个角度攻克细粒度动作生成任务：1）手绘条件：为无需繁琐文本输入即可精准捕捉用户意图，我们开发了一种算法，能够自动生成适应不同数据集格式的手绘火柴人草图；2）多条件融合：提出一个集成到扩散过程中的多条件模块（MCM），使模型能够利用所有可能的条件组合，同时相比传统方法降低计算复杂度；3）无训练引导：值得注意的是，DrawMotion中的MCM确保其中间特征位于连续空间内，使得分类器引导梯度能够更新这些特征，从而在保持保真度的同时使生成的动作与用户意图对齐。定量实验与用户研究均表明，在生成符合用户想象的动作时，手绘方法可节省用户约46.7%的时间。代码、演示及相关数据已在 https://github.com/InvertedForest/DrawMotion 公开。

English

Text-to-motion generation, which translates textual descriptions into human motions, faces the challenge that users often struggle to precisely convey their intended motions through text alone. To address this issue, this paper introduces DrawMotion, an efficient diffusion-based framework designed for multi-condition scenarios. DrawMotion generates motions based on both a conventional text condition and a novel hand-drawing condition, which provide semantic and spatial control over the generated motions, respectively. Specifically, we tackle the fine-grained motion generation task from three perspectives: 1) freehand drawing condition. To accurately capture users' intended motions without requiring tedious textual input, we develop an algorithm to automatically generate hand-drawn stickman sketches across different dataset formats; 2) multi-condition fusion. We propose a Multi-Condition Module (MCM) that is integrated into the diffusion process, enabling the model to exploit all possible condition combinations while reducing computational complexity compared to conventional approaches; and 3) training-free guidance. Notably, the MCM in DrawMotion ensures that its intermediate features lie in a continuous space, allowing classifier-guidance gradients to update the features and thereby aligning the generated motions with user intentions while preserving fidelity. Quantitative experiments and user studies demonstrate that the freehand drawing approach reduces user time by approximately 46.7% when generating motions aligned with their imagination. The code, demos, and relevant data are publicly available at https://github.com/InvertedForest/DrawMotion.