从长文本合成无限可控的角色动画:从故事到动作
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
November 13, 2023
作者: Zhongfei Qing, Zhongang Cai, Zhitao Yang, Lei Yang
cs.AI
摘要
从故事中生成自然的人类动作具有改变动画、游戏和电影行业格局的潜力。当角色需要根据长篇描述移动到不同位置并执行特定动作时,一个新的具有挑战性的任务——从故事到动作(Story-to-Motion)就产生了。这一任务要求融合低层控制(轨迹)和高层控制(动作语义)。先前在角色控制和文本到动作方面的研究已经涉及相关方面,但一个全面的解决方案仍然难以实现:角色控制方法无法处理文本描述,而文本到动作方法缺乏位置约束,通常会产生不稳定的动作。鉴于这些限制,我们提出了一个新颖的系统,可以生成可控、无限长的动作和轨迹,与输入文本对齐。我们利用当代大型语言模型作为文本驱动的动作调度器,从长篇文本中提取一系列(文本、位置、持续时间)对。我们开发了一个文本驱动的动作检索方案,结合了动作匹配、动作语义和轨迹约束。我们设计了一个渐进式掩码变换器,解决了过渡动作中常见的问题,如不自然的姿势和脚滑动。除了作为首个从故事到动作的全面解决方案的开创性角色外,我们的系统在轨迹跟随、时间动作组合和动作混合等三个不同子任务上进行了评估,在各方面均优于先前最先进的动作合成方法。主页:https://story2motion.github.io/。
English
Generating natural human motion from a story has the potential to transform
the landscape of animation, gaming, and film industries. A new and challenging
task, Story-to-Motion, arises when characters are required to move to various
locations and perform specific motions based on a long text description. This
task demands a fusion of low-level control (trajectories) and high-level
control (motion semantics). Previous works in character control and
text-to-motion have addressed related aspects, yet a comprehensive solution
remains elusive: character control methods do not handle text description,
whereas text-to-motion methods lack position constraints and often produce
unstable motions. In light of these limitations, we propose a novel system that
generates controllable, infinitely long motions and trajectories aligned with
the input text. (1) We leverage contemporary Large Language Models to act as a
text-driven motion scheduler to extract a series of (text, position, duration)
pairs from long text. (2) We develop a text-driven motion retrieval scheme that
incorporates motion matching with motion semantic and trajectory constraints.
(3) We design a progressive mask transformer that addresses common artifacts in
the transition motion such as unnatural pose and foot sliding. Beyond its
pioneering role as the first comprehensive solution for Story-to-Motion, our
system undergoes evaluation across three distinct sub-tasks: trajectory
following, temporal action composition, and motion blending, where it
outperforms previous state-of-the-art motion synthesis methods across the
board. Homepage: https://story2motion.github.io/.