シンプルレシピの有効性：視覚-言語-行動モデルは強化学習による自然な継続学習器

要旨

視覚言語行動（VLA）モデルに対する継続的強化学習（CRL）は、開かれた進化環境に適応可能な自己改善型具身エージェント実現への有望な方向性である。しかし、継続学習における従来の知見は、単純な逐次ファインチューニング（Seq. FT）が破滅的忘れ込みを引き起こし、複雑なCRL戦略を必要とすると示唆してきた。本研究では、大規模事前学習済みVLAモデル3種と挑戦的な生涯RLベンチマーク5種を用い、CRLに関する体系的な研究を改めて行った。その結果、確立された通説に反し、低ランク適応（LoRA）を組み合わせた単純なSeq. FTが驚くほど強力であることを発見した。これは高い可塑性を達成し、忘れ込みがほとんどなく、強力なゼロショット汎化性能を維持し、より複雑なCRL手法をしばしば上回る。詳細な分析を通じて、この頑健性は大規模事前学習モデル、パラメータ効率の良い適応、方策オン型RLの相乗効果から生じることを示す。これらの要素が組み合わさることで、安定性と可塑性のトレードオフが再構築され、継続的適応が安定かつスケーラブルに実現される。我々の結果は、Seq. FTをVLAを用いた継続的RLの強力な手法として位置づけ、大規模モデル時代における生涯学習に関する新たな知見を提供する。コードはgithub.com/UT-Austin-RobIn/continual-vla-rlで公開されている。

English

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

シンプルレシピの有効性：視覚-言語-行動モデルは強化学習による自然な継続学習器

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

要旨

Support