CLARE: 자율적 어댑터 라우팅 및 확장을 통한 시각-언어-행동 모델의 연속 학습

초록

로봇에게 복잡한 조작 작업을 가르치기 위해 사전 훈련된 시각-언어-행동 모델(VLA)을 작업별 데이터로 미세 조정하는 것이 일반적인 방법으로 자리 잡았습니다. 그러나 이 방식은 기존 표현을 업데이트하기 때문에, 로봇이 이미 습득한 지식을 유지하면서 새로운 작업과 환경에 지속적으로 적응해야 하는 실제 세계의 장기 운영에는 적합하지 않습니다. 로봇 공학을 위한 기존의 지속적 학습 방법들은 일반적으로 이전 데이터(예시 데이터) 저장을 필요로 하거나, 긴 작업 순서를 처리하는 데 어려움을 겪거나, 배포 시 작업 식별자에 의존합니다. 이러한 한계를 해결하기 위해 우리는 VLA를 활용한 예시 데이터 없는 지속적 학습을 위한 일반적이고 매개변수 효율적인 프레임워크인 CLARE를 제안합니다. CLARE는 선택된 순전파 계층에 경량 모듈식 어댑터를 도입하고, 계층별 특징 유사성을 바탕으로 새로운 작업을 학습할 때 필요한 부분에서만 모델을 자율적으로 확장합니다. 배포 시에는 오토인코더 기반 라우팅 메커니즘이 작업 레이블 없이 가장 관련성 높은 어댑터를 동적으로 활성화합니다. LIBERO 벤치마크를 통한 광범위한 실험을 통해 CLARE가 이전 작업의 치명적 망각 없이 새로운 작업에서 높은 성능을 달성하며, 예시 데이터 기반 방법들보다도 크게 우수함을 입증했습니다. 코드와 데이터는 https://tum-lsy.github.io/clare에서 확인할 수 있습니다.

English

To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods. Code and data are available at https://tum-lsy.github.io/clare.

CLARE: 자율적 어댑터 라우팅 및 확장을 통한 시각-언어-행동 모델의 연속 학습

CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

초록

Support