从长文本合成无限可控的角色动画：从故事到动作

摘要

从故事中生成自然的人类动作具有改变动画、游戏和电影行业格局的潜力。当角色需要根据长篇描述移动到不同位置并执行特定动作时，一个新的具有挑战性的任务——从故事到动作（Story-to-Motion）就产生了。这一任务要求融合低层控制（轨迹）和高层控制（动作语义）。先前在角色控制和文本到动作方面的研究已经涉及相关方面，但一个全面的解决方案仍然难以实现：角色控制方法无法处理文本描述，而文本到动作方法缺乏位置约束，通常会产生不稳定的动作。鉴于这些限制，我们提出了一个新颖的系统，可以生成可控、无限长的动作和轨迹，与输入文本对齐。我们利用当代大型语言模型作为文本驱动的动作调度器，从长篇文本中提取一系列（文本、位置、持续时间）对。我们开发了一个文本驱动的动作检索方案，结合了动作匹配、动作语义和轨迹约束。我们设计了一个渐进式掩码变换器，解决了过渡动作中常见的问题，如不自然的姿势和脚滑动。除了作为首个从故事到动作的全面解决方案的开创性角色外，我们的系统在轨迹跟随、时间动作组合和动作混合等三个不同子任务上进行了评估，在各方面均优于先前最先进的动作合成方法。主页：https://story2motion.github.io/。

English

Generating natural human motion from a story has the potential to transform the landscape of animation, gaming, and film industries. A new and challenging task, Story-to-Motion, arises when characters are required to move to various locations and perform specific motions based on a long text description. This task demands a fusion of low-level control (trajectories) and high-level control (motion semantics). Previous works in character control and text-to-motion have addressed related aspects, yet a comprehensive solution remains elusive: character control methods do not handle text description, whereas text-to-motion methods lack position constraints and often produce unstable motions. In light of these limitations, we propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text. (1) We leverage contemporary Large Language Models to act as a text-driven motion scheduler to extract a series of (text, position, duration) pairs from long text. (2) We develop a text-driven motion retrieval scheme that incorporates motion matching with motion semantic and trajectory constraints. (3) We design a progressive mask transformer that addresses common artifacts in the transition motion such as unnatural pose and foot sliding. Beyond its pioneering role as the first comprehensive solution for Story-to-Motion, our system undergoes evaluation across three distinct sub-tasks: trajectory following, temporal action composition, and motion blending, where it outperforms previous state-of-the-art motion synthesis methods across the board. Homepage: https://story2motion.github.io/.

从长文本合成无限可控的角色动画：从故事到动作

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

摘要

Support