CLARE: 自律的アダプタルーティングと拡張による視覚-言語-行動モデルの継続的学習

要旨

複雑な把持タスクをロボットに学習させる際、事前学習済みの視覚言語行動モデル（VLA）をタスク固有データでファインチューニングする方法が一般的である。しかし、この手法は既存の表現を更新するため、新しいタスクや環境への継続的適応が必要でありながら、獲得済み知識の保持が求められる実世界での長期的運用には不向きである。既存のロボット工学向け継続学習手法は、従来データの保存を必要とする場合が多く、長いタスク系列に苦戦したり、導入時にタスク識別子に依存したりする課題がある。これらの制約を解決するため、本論文ではVLAを用いた Exemplar-free 継続学習のための汎用的でパラメータ効率の高いフレームワーク「CLARE」を提案する。CLAREは選択されたフィードフォワード層に軽量モジュラーアダプターを導入し、層ごとの特徴類似度に基づいて、新しいタスクの学習時に必要な箇所のみ自律的にモデルを拡張する。導入時には、オートエンコーダベースのルーティング機構がタスクラベルを必要とせず、最も関連性の高いアダプターを動的に活性化する。LIBEROベンチマークによる大規模実験を通じて、CLAREが従来タスクの破滅的忘れを起こすことなく新規タスクで高い性能を達成し、Exemplar-based 手法を大幅に上回ることを実証した。コードとデータは https://tum-lsy.github.io/clare で公開している。

English

To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods. Code and data are available at https://tum-lsy.github.io/clare.

CLARE: 自律的アダプタルーティングと拡張による視覚-言語-行動モデルの継続的学習

CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

要旨

Support