T3M: Tekstgestuurde 3D Menselijke Bewegingsynthese vanuit Spraak

Samenvatting

Spraakgestuurde 3D-bewegingssynthese streeft ernaar levensechte animaties te creëren op basis van menselijke spraak, met mogelijke toepassingen in virtual reality, gaming en filmproductie. Bestaande benaderingen vertrouwen uitsluitend op spraakaudio voor bewegingsgeneratie, wat leidt tot onnauwkeurige en inflexibele syntheseresultaten. Om dit probleem te verlichten, introduceren we een nieuwe tekstgestuurde 3D-menselijke bewegingssynthesemethode, genaamd T3M. In tegenstelling tot traditionele benaderingen, maakt T3M nauwkeurige controle over bewegingssynthese mogelijk via tekstuele invoer, waardoor de mate van diversiteit en gebruikersaanpassing wordt vergroot. De experimentele resultaten tonen aan dat T3M de state-of-the-art methoden aanzienlijk kan overtreffen in zowel kwantitatieve metingen als kwalitatieve evaluaties. We hebben onze code openbaar vrijgegeven op https://github.com/Gloria2tt/T3M.git{https://github.com/Gloria2tt/T3M.git}.

English

Speech-driven 3D motion synthesis seeks to create lifelike animations based on human speech, with potential uses in virtual reality, gaming, and the film production. Existing approaches reply solely on speech audio for motion generation, leading to inaccurate and inflexible synthesis results. To mitigate this problem, we introduce a novel text-guided 3D human motion synthesis method, termed T3M. Unlike traditional approaches, T3M allows precise control over motion synthesis via textual input, enhancing the degree of diversity and user customization. The experiment results demonstrate that T3M can greatly outperform the state-of-the-art methods in both quantitative metrics and qualitative evaluations. We have publicly released our code at https://github.com/Gloria2tt/T3M.git{https://github.com/Gloria2tt/T3M.git}

T3M: Tekstgestuurde 3D Menselijke Bewegingsynthese vanuit Spraak

T3M: Text Guided 3D Human Motion Synthesis from Speech

Samenvatting

Summary

Support

Support