T3M:從語音引導的文本指導下的3D人體動作合成
T3M: Text Guided 3D Human Motion Synthesis from Speech
August 23, 2024
作者: Wenshuo Peng, Kaipeng Zhang, Sai Qian Zhang
cs.AI
摘要
以語音驅動的3D動作合成旨在根據人類語音創建逼真的動畫,潛在應用包括虛擬現實、遊戲和電影製作。現有方法僅依賴語音音頻進行動作生成,導致合成結果不準確且缺乏靈活性。為解決此問題,我們引入了一種新穎的文本引導的3D人體動作合成方法,稱為T3M。與傳統方法不同,T3M通過文本輸入實現對動作合成的精確控制,提高了多樣性和用戶定製程度。實驗結果表明,T3M在定量指標和定性評估方面均能明顯優於最先進的方法。我們已在https://github.com/Gloria2tt/T3M.git上公開發布了我們的代碼。
English
Speech-driven 3D motion synthesis seeks to create lifelike animations based
on human speech, with potential uses in virtual reality, gaming, and the film
production. Existing approaches reply solely on speech audio for motion
generation, leading to inaccurate and inflexible synthesis results. To mitigate
this problem, we introduce a novel text-guided 3D human motion synthesis
method, termed T3M. Unlike traditional approaches, T3M allows precise
control over motion synthesis via textual input, enhancing the degree of
diversity and user customization. The experiment results demonstrate that T3M
can greatly outperform the state-of-the-art methods in both quantitative
metrics and qualitative evaluations. We have publicly released our code at
https://github.com/Gloria2tt/T3M.git{https://github.com/Gloria2tt/T3M.git}Summary
AI-Generated Summary