ChatPaper.aiChatPaper

循環深度視覺語言動作模型:透過潛在迭代推理實現隱性測試時計算擴展

Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

February 8, 2026
作者: Yalcin Tur, Jalal Naghiyev, Haoquan Fang, Wei-Chuan Tsai, Jiafei Duan, Dieter Fox, Ranjay Krishna
cs.AI

摘要

當前視覺-語言-動作模型依賴固定計算深度,對簡單調整與複雜多步驟操作均消耗相同計算量。儘管思維鏈提示支持可變計算,但其記憶體需求呈線性增長,且難以適用於連續動作空間。我們提出循環深度VLA架構,通過潛在迭代優化而非顯式標記生成來實現計算自適應性。該模型採用權重共享的循環動作頭,在恆定記憶體佔用下支持任意推理深度。通過時間截斷反向傳播訓練,可有效監督優化過程。推理時,RD-VLA基於潛在狀態收斂的自適應停止準則動態分配計算量。在複雜操作任務上的實驗表明:單次迭代完全失敗的任務經過四次迭代成功率超過90%,而簡單任務則快速飽和。RD-VLA為機器人測試時計算提供了可擴展路徑,以潛在推理替代基於標記的推理,實現恆定記憶體使用量,並較先前基於推理的VLA模型最高提升80倍推理速度。項目頁面:https://rd-vla.github.io/
English
Current Vision-Language-Action (VLA) models rely on fixed computational depth, expending the same amount of compute on simple adjustments and complex multi-step manipulation. While Chain-of-Thought (CoT) prompting enables variable computation, it scales memory linearly and is ill-suited for continuous action spaces. We introduce Recurrent-Depth VLA (RD-VLA), an architecture that achieves computational adaptivity via latent iterative refinement rather than explicit token generation. RD-VLA employs a recurrent, weight-tied action head that supports arbitrary inference depth with a constant memory footprint. The model is trained using truncated backpropagation through time (TBPTT) to efficiently supervise the refinement process. At inference, RD-VLA dynamically allocates compute using an adaptive stopping criterion based on latent convergence. Experiments on challenging manipulation tasks show that recurrent depth is critical: tasks that fail entirely (0 percent success) with single-iteration inference exceed 90 percent success with four iterations, while simpler tasks saturate rapidly. RD-VLA provides a scalable path to test-time compute in robotics, replacing token-based reasoning with latent reasoning to achieve constant memory usage and up to 80x inference speedup over prior reasoning-based VLA models. Project page: https://rd-vla.github.io/
PDF642February 11, 2026