DrawMotion：通過手繪生成三維人體動作

摘要

文本到動作生成技術，旨在將文字描述轉化為人體動作，但面臨使用者常難以僅透過文字精確傳達意圖動作的挑戰。為解決此問題，本文提出 DrawMotion，這是一個專為多條件場景設計的高效擴散式框架。DrawMotion 可同時依據傳統文字條件與新穎的手繪條件生成動作，分別提供對生成動作的語義控制與空間控制。具體而言，我們從三個角度處理細粒度動作生成任務：1) 徒手繪製條件。為準確捕捉使用者意圖動作，無需繁瑣的文字輸入，我們開發一套演算法，能自動在不同資料集格式下生成徒手繪製的火柴人草圖；2) 多條件融合。我們提出融入擴散過程的多條件模組（MCM），使模型能利用所有可能的條件組合，同時相比傳統方法降低計算複雜度；3) 免訓練引導。值得注意的是，DrawMotion 中的 MCM 確保其中間特徵處於連續空間，使得分類器引導梯度能更新這些特徵，從而在維持真實性的同時，使生成動作與使用者意圖對齊。量化實驗與使用者研究顯示，徒手繪製方法在生成符合使用者想像的動作時，可減少約 46.7% 的使用時間。程式碼、展示影片及相關資料均已公開於 https://github.com/InvertedForest/DrawMotion。

English

Text-to-motion generation, which translates textual descriptions into human motions, faces the challenge that users often struggle to precisely convey their intended motions through text alone. To address this issue, this paper introduces DrawMotion, an efficient diffusion-based framework designed for multi-condition scenarios. DrawMotion generates motions based on both a conventional text condition and a novel hand-drawing condition, which provide semantic and spatial control over the generated motions, respectively. Specifically, we tackle the fine-grained motion generation task from three perspectives: 1) freehand drawing condition. To accurately capture users' intended motions without requiring tedious textual input, we develop an algorithm to automatically generate hand-drawn stickman sketches across different dataset formats; 2) multi-condition fusion. We propose a Multi-Condition Module (MCM) that is integrated into the diffusion process, enabling the model to exploit all possible condition combinations while reducing computational complexity compared to conventional approaches; and 3) training-free guidance. Notably, the MCM in DrawMotion ensures that its intermediate features lie in a continuous space, allowing classifier-guidance gradients to update the features and thereby aligning the generated motions with user intentions while preserving fidelity. Quantitative experiments and user studies demonstrate that the freehand drawing approach reduces user time by approximately 46.7% when generating motions aligned with their imagination. The code, demos, and relevant data are publicly available at https://github.com/InvertedForest/DrawMotion.