ChatPaper.aiChatPaper

SARAH:空间感知实时智能体人类

SARAH: Spatially Aware Real-time Agentic Humans

February 20, 2026
作者: Evonne Ng, Siwei Zhang, Zhang Chen, Michael Zollhoefer, Alexander Richard
cs.AI

摘要

随着具身智能体在虚拟现实、远程呈现和数字人应用中的核心地位日益凸显,其动作生成需突破语音驱动手势的局限:智能体应能转向用户、响应其移动并保持自然视线。现有方法缺乏这种空间感知能力。我们提出了首个实时全因果的空间感知对话动作生成方法,填补了这一空白,该方法可部署于流式VR头显。基于用户位置和对话音频,我们的方法能生成全身动作,在实现语音手势同步的同时,根据用户方位调整智能体朝向。该架构结合了基于因果Transformer的变分自编码器(含交错潜变量令牌以实现流式推理)与基于用户轨迹和音频的条件流匹配模型。为支持多样化视线偏好,我们引入带分类器无关指导的视线评分机制,实现学习与控制解耦:模型从数据中学习自然空间对齐,而用户可在推理阶段调节眼神接触强度。在Embody 3D数据集上,本方法以超过300 FPS的速度(比非因果基线快3倍)达到最优动作质量,同时精准捕捉自然对话中的细微空间动态。我们在实时VR系统中验证了该方法,实现了空间感知对话智能体的实时部署。详情请参阅https://evonneng.github.io/sarah/。
English
As embodied agents become central to VR, telepresence, and digital human applications, their motion must go beyond speech-aligned gestures: agents should turn toward users, respond to their movement, and maintain natural gaze. Current methods lack this spatial awareness. We close this gap with the first real-time, fully causal method for spatially-aware conversational motion, deployable on a streaming VR headset. Given a user's position and dyadic audio, our approach produces full-body motion that aligns gestures with speech while orienting the agent according to the user. Our architecture combines a causal transformer-based VAE with interleaved latent tokens for streaming inference and a flow matching model conditioned on user trajectory and audio. To support varying gaze preferences, we introduce a gaze scoring mechanism with classifier-free guidance to decouple learning from control: the model captures natural spatial alignment from data, while users can adjust eye contact intensity at inference time. On the Embody 3D dataset, our method achieves state-of-the-art motion quality at over 300 FPS -- 3x faster than non-causal baselines -- while capturing the subtle spatial dynamics of natural conversation. We validate our approach on a live VR system, bringing spatially-aware conversational agents to real-time deployment. Please see https://evonneng.github.io/sarah/ for details.
PDF41February 24, 2026