T3M:从语音引导的文本指导3D人体动作合成
T3M: Text Guided 3D Human Motion Synthesis from Speech
August 23, 2024
作者: Wenshuo Peng, Kaipeng Zhang, Sai Qian Zhang
cs.AI
摘要
基于语音驱动的3D运动合成旨在基于人类语音创建逼真的动画,可用于虚拟现实、游戏和电影制作等领域。现有方法仅依赖语音音频进行运动生成,导致合成结果不准确且缺乏灵活性。为了解决这一问题,我们引入了一种新颖的文本引导的3D人体运动合成方法,称为T3M。与传统方法不同,T3M允许通过文本输入精确控制运动合成,增强了多样性和用户定制程度。实验结果表明,T3M在定量指标和定性评估方面均能明显优于现有方法。我们已在https://github.com/Gloria2tt/T3M.git上公开发布了我们的代码。
English
Speech-driven 3D motion synthesis seeks to create lifelike animations based
on human speech, with potential uses in virtual reality, gaming, and the film
production. Existing approaches reply solely on speech audio for motion
generation, leading to inaccurate and inflexible synthesis results. To mitigate
this problem, we introduce a novel text-guided 3D human motion synthesis
method, termed T3M. Unlike traditional approaches, T3M allows precise
control over motion synthesis via textual input, enhancing the degree of
diversity and user customization. The experiment results demonstrate that T3M
can greatly outperform the state-of-the-art methods in both quantitative
metrics and qualitative evaluations. We have publicly released our code at
https://github.com/Gloria2tt/T3M.git{https://github.com/Gloria2tt/T3M.git}Summary
AI-Generated Summary