FASTER:重新思考即時流程視覺語言架構
FASTER: Rethinking Real-Time Flow VLAs
March 19, 2026
作者: Yuxiang Lu, Zhe Liu, Xianzhe Fan, Zhenya Yang, Jinghua Hou, Junyi Li, Kaixin Ding, Hengshuang Zhao
cs.AI
摘要
在物理世界中部署視覺語言動作(VLA)模型時,即時執行至關重要。現有的非同步推理方法主要優化軌跡平滑度,卻忽略了應對環境變化的關鍵延遲問題。本文通過重新審視動作分塊策略中的反應機制,系統性分析了影響反應時間的關鍵因素。我們發現反應時間遵循由「首次動作時間」(TTFA)與執行視窗共同決定的均勻分佈。此外,研究揭示基於流式VLA模型採用恆定調度策略的標準做法存在效率缺陷——該策略強制系統完成所有採樣步驟後才能啟動動作,從而形成反應延遲的瓶頸。為突破此限制,我們提出「即時反應快速動作採樣」(FASTER)框架。通過引入視窗感知調度機制,FASTER在流式採樣過程中自適應地優先處理近期動作,將即時反應的去噪過程壓縮十倍(如在π_{0.5}和X-VLA中)至單步完成,同時保持長時域軌跡的生成質量。結合流式客戶端-服務端管道,FASTER在真實機器人上顯著降低了有效反應延遲,尤其在消費級GPU部署場景中表現突出。真實環境實驗(包括高動態乒乓球任務)證實,FASTER為通用策略開創了前所未有的即時響應能力,能夠快速生成精準平滑的運動軌跡。
English
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in π_{0.5} and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.