MIBURI:迈向富有表现力的交互式手势合成
MIBURI: Towards Expressive Interactive Gesture Synthesis
March 3, 2026
作者: M. Hamza Mughal, Rishabh Dabral, Vera Demberg, Christian Theobalt
cs.AI
摘要
具身对话代理(ECA)旨在通过语音、手势和面部表情模拟人类面对面交互。当前基于大语言模型(LLM)的对话代理缺乏具身性和自然交互所必需的表现性姿态。现有ECA解决方案常产生僵硬、低多样性的动作,难以实现类人交互。另一方面,语音协同手势的生成方法虽能产生自然肢体动作,但依赖未来语音上下文且计算耗时。为弥补这一差距,我们提出MIBURI——首个在线因果推理框架,可同步实时对话生成富有表现力的全身手势与面部表情。我们采用身体部位感知的手势编解码器,将分层运动细节编码为多级离散标记。这些标记随后通过二维因果框架自回归生成,该框架以基于LLM的语音-文本嵌入为条件,实时建模时序动态与部位级运动层次。此外,我们引入辅助目标函数以激发表现力丰富的手势,同时避免收敛至静态姿势。对比评估表明,我们的因果实时方法相较现有基线能产生更自然且上下文契合的手势。敬请访问https://vcai.mpi-inf.mpg.de/projects/MIBURI/ 观看演示视频。
English
Embodied Conversational Agents (ECAs) aim to emulate human face-to-face interaction through speech, gestures, and facial expressions. Current large language model (LLM)-based conversational agents lack embodiment and the expressive gestures essential for natural interaction. Existing solutions for ECAs often produce rigid, low-diversity motions, that are unsuitable for human-like interaction. Alternatively, generative methods for co-speech gesture synthesis yield natural body gestures but depend on future speech context and require long run-times. To bridge this gap, we present MIBURI, the first online, causal framework for generating expressive full-body gestures and facial expressions synchronized with real-time spoken dialogue. We employ body-part aware gesture codecs that encode hierarchical motion details into multi-level discrete tokens. These tokens are then autoregressively generated by a two-dimensional causal framework conditioned on LLM-based speech-text embeddings, modeling both temporal dynamics and part-level motion hierarchy in real time. Further, we introduce auxiliary objectives to encourage expressive and diverse gestures while preventing convergence to static poses. Comparative evaluations demonstrate that our causal and real-time approach produces natural and contextually aligned gestures against recent baselines. We urge the reader to explore demo videos on https://vcai.mpi-inf.mpg.de/projects/MIBURI/.