ChatPaper.aiChatPaper

VLASH:通过未来状态感知异步推理实现实时可变长度注意力机制

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

November 30, 2025
作者: Jiaming Tang, Yufei Sun, Yilong Zhao, Shang Yang, Yujun Lin, Zhuoyang Zhang, James Hou, Yao Lu, Zhijian Liu, Song Han
cs.AI

摘要

视觉-语言-动作模型(VLA)在多样化机器人任务中的能力日益增强。然而,其实际部署仍存在速度迟缓与效率低下的问题:演示视频常需加速5-10倍以呈现流畅效果,且存在明显动作停滞及对环境变化的延迟响应。异步推理通过实现机器人动作执行与推理计算同步进行,为达成连续低延迟控制提供了可行方案。但由于推理过程中机器人与环境持续演变,预测区间与执行区间会产生时序错位,导致显著的动作不稳定性。现有方法或牺牲精度或引入运行时开销以缓解该问题。我们提出VLASH——一种通用VLA异步推理框架,可在无需额外开销或架构改动的前提下实现平滑、精准、快速的反应控制。该框架通过将机器人状态与先前生成的动作片段向前推演,预估未来执行时刻的状态,从而弥合预测与执行间的鸿沟。实验表明,相较于同步推理,VLASH可实现最高2.03倍的速度提升,并将反应延迟降低达17.4倍,同时完全保持原始精度。此外,它使VLA能胜任乒乓球对打、打地鼠等需要快速反应与高精度的任务,而传统同步推理在此类任务中均告失败。代码已开源于:https://github.com/mit-han-lab/vlash
English
Vision-Language-Action models (VLAs) are becoming increasingly capable across diverse robotic tasks. However, their real-world deployment remains slow and inefficient: demonstration videos are often sped up by 5-10x to appear smooth, with noticeable action stalls and delayed reactions to environmental changes. Asynchronous inference offers a promising solution to achieve continuous and low-latency control by enabling robots to execute actions and perform inference simultaneously. However, because the robot and environment continue to evolve during inference, a temporal misalignment arises between the prediction and execution intervals. This leads to significant action instability, while existing methods either degrade accuracy or introduce runtime overhead to mitigate it. We propose VLASH, a general asynchronous inference framework for VLAs that delivers smooth, accurate, and fast reaction control without additional overhead or architectural changes. VLASH estimates the future execution-time state by rolling the robot state forward with the previously generated action chunk, thereby bridging the gap between prediction and execution. Experiments show that VLASH achieves up to 2.03x speedup and reduces reaction latency by up to 17.4x compared to synchronous inference while fully preserving the original accuracy. Moreover, it empowers VLAs to handle fast-reaction, high-precision tasks such as playing ping-pong and playing whack-a-mole, where traditional synchronous inference fails. Code is available at https://github.com/mit-han-lab/vlash
PDF170December 3, 2025