SARAH:具備空間感知能力的即時智慧體人類
SARAH: Spatially Aware Real-time Agentic Humans
February 20, 2026
作者: Evonne Ng, Siwei Zhang, Zhang Chen, Michael Zollhoefer, Alexander Richard
cs.AI
摘要
隨著具身代理在虛擬實境、遠程臨場與數位人應用中日益重要,其動作必須超越語音同步的手勢:代理應能轉向用戶、響應其移動並保持自然視線。現有方法缺乏這種空間感知能力。我們通過首個即時全因果的空間感知對話動作生成方法彌補這一空白,該方法可部署於串流式VR頭戴裝置。基於用戶位置與雙人語音數據,我們的方法能生成全身動作,在保持手勢與語音同步的同時,根據用戶方位調整代理朝向。我們的架構結合了基於因果轉換器的變分自編碼器(具交錯潛在標記以實現串流推論)以及根據用戶軌跡與音訊調控的流匹配模型。為支持多樣化視線偏好,我們引入帶有分類器無引導的視線評分機制,實現學習與控制的解耦:模型從數據中捕捉自然的空間對齊規律,而用戶可在推論時調整眼神接觸強度。在Embody 3D數據集上,我們的方法以超過300 FPS的速度達到最先進的動作生成品質——比非因果基準快3倍——同時精準捕捉自然對話中細膩的空間動態。我們在實時VR系統中驗證了該方法,使空間感知對話代理得以實現即時部署。詳情請見 https://evonneng.github.io/sarah/。
English
As embodied agents become central to VR, telepresence, and digital human applications, their motion must go beyond speech-aligned gestures: agents should turn toward users, respond to their movement, and maintain natural gaze. Current methods lack this spatial awareness. We close this gap with the first real-time, fully causal method for spatially-aware conversational motion, deployable on a streaming VR headset. Given a user's position and dyadic audio, our approach produces full-body motion that aligns gestures with speech while orienting the agent according to the user. Our architecture combines a causal transformer-based VAE with interleaved latent tokens for streaming inference and a flow matching model conditioned on user trajectory and audio. To support varying gaze preferences, we introduce a gaze scoring mechanism with classifier-free guidance to decouple learning from control: the model captures natural spatial alignment from data, while users can adjust eye contact intensity at inference time. On the Embody 3D dataset, our method achieves state-of-the-art motion quality at over 300 FPS -- 3x faster than non-causal baselines -- while capturing the subtle spatial dynamics of natural conversation. We validate our approach on a live VR system, bringing spatially-aware conversational agents to real-time deployment. Please see https://evonneng.github.io/sarah/ for details.