简约配方显奇效:视觉-语言-行动模型借强化学习实现自然持续学习
Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning
March 12, 2026
作者: Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, Roberto Martin-Martin
cs.AI
摘要
面向视觉-语言-动作模型的持续强化学习是实现自我提升具身智能体的重要方向,这类智能体能够在开放演化的环境中持续适应。传统持续学习理论认为,简单的顺序微调会导致灾难性遗忘,因此需要复杂的持续强化学习策略。本研究回归本源,基于三种模型和五个具有挑战性的终身强化学习基准,对大型预训练VLA模型的持续强化学习进行了系统性探究。出乎意料的是,我们发现采用低秩自适应技术的简单顺序微调表现出惊人优势:它具有高度可塑性,几乎不会出现遗忘现象,同时保持强大的零样本泛化能力,其表现往往优于复杂的持续强化学习方法。通过深入分析,我们揭示这种鲁棒性源于大型预训练模型、参数高效自适应和同策略强化学习三者的协同作用。这些要素共同重塑了稳定性与可塑性之间的平衡关系,使得持续适应过程既稳定又可扩展。我们的研究确立了顺序微调作为VLA持续强化学习的有效方法,为大数据模型时代的终身学习提供了新见解。代码已发布于github.com/UT-Austin-RobIn/continual-vla-rl。
English
Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.