CLARE:基于自适应路由与扩展的视觉-语言-动作模型持续学习方法
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
January 14, 2026
作者: Ralf Römer, Yi Zhang, Angela P. Schoellig
cs.AI
摘要
为教授机器人复杂操作任务,当前普遍采用在任务专用数据上微调预训练视觉-语言-动作模型(VLA)的方法。然而,由于该方案会更新现有表征,无法适用于现实世界的长期运行场景——机器人必须在持续适应新任务和环境的同时,保留已掌握的知识。现有的机器人持续学习方法通常需要存储历史数据(样本),难以应对长任务序列,或依赖任务标识符进行部署。为突破这些局限,我们提出CLARE:一种面向VLA的无样本持续学习通用参数高效框架。CLARE通过层级特征相似性指导,在选定前馈层中引入轻量化模块化适配器,并在学习新任务时仅对必要模块进行自主扩展。部署阶段,基于自编码器的路由机制无需任务标签即可动态激活最相关适配器。通过在LIBERO基准测试上的大量实验表明,CLARE在实现新任务高性能的同时不会对早期任务产生灾难性遗忘,其表现显著优于基于样本的方法。代码与数据详见https://tum-lsy.github.io/clare。
English
To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods. Code and data are available at https://tum-lsy.github.io/clare.