故事轉動畫:從長文本中合成無限且可控制的角色動畫
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text
November 13, 2023
作者: Zhongfei Qing, Zhongang Cai, Zhitao Yang, Lei Yang
cs.AI
摘要
從故事中生成自然的人類動作具有改變動畫、遊戲和電影行業格局的潛力。當角色需要根據長篇描述移動到不同位置並執行特定動作時,一項新且具挑戰性的任務「故事轉動作」應運而生。這項任務要求融合低層控制(軌跡)和高層控制(動作語義)。先前在角色控制和文本轉動作方面的研究已經涉及相關方面,但全面的解決方案仍然難以捉摸:角色控制方法無法處理文本描述,而文本轉動作方法缺乏位置約束並且通常產生不穩定的動作。鑒於這些限制,我們提出了一個新穎的系統,可以生成可控制、無限長的動作和軌跡,並與輸入文本對齊。 (1)我們利用當代大型語言模型作為文本驅動的動作排程器,從長篇文本中提取一系列(文本、位置、持續時間)對。 (2)我們開發了一種文本驅動的動作檢索方案,將動作匹配與動作語義和軌跡約束相結合。 (3)我們設計了一個漸進式遮罩變換器,解決了過渡動作中常見的問題,如不自然的姿勢和腳滑動。除了作為首個故事轉動作的全面解決方案的開創性角色外,我們的系統在軌跡跟隨、時間動作組合和動作混合等三個不同子任務上進行了評估,在各方面均優於先前最先進的動作合成方法。首頁:https://story2motion.github.io/。
English
Generating natural human motion from a story has the potential to transform
the landscape of animation, gaming, and film industries. A new and challenging
task, Story-to-Motion, arises when characters are required to move to various
locations and perform specific motions based on a long text description. This
task demands a fusion of low-level control (trajectories) and high-level
control (motion semantics). Previous works in character control and
text-to-motion have addressed related aspects, yet a comprehensive solution
remains elusive: character control methods do not handle text description,
whereas text-to-motion methods lack position constraints and often produce
unstable motions. In light of these limitations, we propose a novel system that
generates controllable, infinitely long motions and trajectories aligned with
the input text. (1) We leverage contemporary Large Language Models to act as a
text-driven motion scheduler to extract a series of (text, position, duration)
pairs from long text. (2) We develop a text-driven motion retrieval scheme that
incorporates motion matching with motion semantic and trajectory constraints.
(3) We design a progressive mask transformer that addresses common artifacts in
the transition motion such as unnatural pose and foot sliding. Beyond its
pioneering role as the first comprehensive solution for Story-to-Motion, our
system undergoes evaluation across three distinct sub-tasks: trajectory
following, temporal action composition, and motion blending, where it
outperforms previous state-of-the-art motion synthesis methods across the
board. Homepage: https://story2motion.github.io/.