FASTER: 실시간 흐름 VLA 재고하기

초록

실시간 실행은 물리적 환경에서 비전-언어-행동(VLA) 모델을 배치하는 데 핵심적입니다. 기존의 비동기 추론 방법은 주로 궤적 부드러움을 최적화하지만, 환경 변화에 대응하는 중요한 지연 시간을 간과해 왔습니다. 본 논문은 행동 청크 정책에서 '반응' 개념을 재고함으로써 반응 시간을 결정하는 요인에 대한 체계적인 분석을 제시합니다. 우리는 반응 시간이 첫 번째 행동까지의 시간(TTFA)과 실행 범위에 의해 공동으로 결정되는 균일 분포를 따른다는 것을 보여줍니다. 더 나아가, 흐름 기반 VLA에서 일정한 스케줄을 적용하는 표준 관행이 비효율적일 수 있으며, 시스템이 모든 샘플링 단계를 완료해야만 어떤 움직임도 시작할 수 있어 반응 지연의 병목 현상을 초래함을 밝혔습니다. 이 문제를 해결하기 위해 우리는 즉각적 반응을 위한 고속 행동 샘플링(FASTER)을 제안합니다. FASTER는 범위 인식 스케줄을 도입하여 흐름 샘플링 과정에서 단기적 행동을 적응적으로 우선시함으로써, 즉각적 반응에 대한 노이즈 제거를 단일 단계로 10배 압축하되(예: π_{0.5} 및 X-VLA 기준) 장기 궤적의 품질은 유지합니다. 스트리밍 클라이언트-서버 파이프라인과 결합된 FASTER는 실제 로봇에서, 특히 소비자 등급 GPU에 배치될 때 유효 반응 지연 시간을 크게 줄입니다. 매우 동적인 탁구 과제를 포함한 실세계 실험을 통해 FASTER가 범용 정책으로 전례 없는 실시간 응답성을 구현하여 정확하고 부드러운 궤적을 빠르게 생성할 수 있음을 입증했습니다.

English

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in π_{0.5} and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

FASTER: 실시간 흐름 VLA 재고하기

FASTER: Rethinking Real-Time Flow VLAs

초록

Support