FASTER: リアルタイムフローVLAの再考

要旨

リアルタイム実行は、Vision-Language-Action（VLA）モデルを物理世界に展開する上で極めて重要である。既存の非同期推論手法は主に軌道の滑らかさを最適化するが、環境変化への反応における重要な遅延を軽視している。本論文は、アクション chunking ポリシーにおける「反応」の概念を再考し、反応時間を支配する要因について体系的な分析を行う。反応時間が「最初のアクションまでの時間（TTFA）」と実行ホライズンによって共同で決定される一様分布に従うことを示す。さらに、フローベースVLAにおいて一定スケジュールを適用する標準的な手法が非効率であり、システムがすべてのサンプリングステップを完了しなければ動作を開始できないため、反応遅延のボトルネックとなっていることを明らかにする。この問題を解決するため、我々はFast Action Sampling for ImmediaTE Reaction（FASTER）を提案する。FASTERはHorizon-Aware Scheduleを導入することで、フローサンプリング中に近未来のアクションを適応的に優先し、即時反応のノイズ除去を10倍（例: π₀.₅ および X-VLA）圧縮して単一ステップで行いながら、長期的な軌道の品質を維持する。ストリーミング型クライアント-サーバーパイプラインと組み合わせることで、FASTERは実ロボット上の実効反応遅延を大幅に低減し、特に民生用GPUでの展開時に効果を発揮する。高度に動的な卓球タスクを含む実世界実験により、FASTERが汎用ポリシーにおいて前例のないリアルタイム応答性を実現し、正確かつ滑らかな軌道を迅速に生成できることを実証する。

English

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in π_{0.5} and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.

FASTER: リアルタイムフローVLAの再考

FASTER: Rethinking Real-Time Flow VLAs

要旨

Support