CLARE: Apprendimento Continuo per Modelli Visione-Linguaggio-Azione tramite Instradamento ed Espansione Autonoma degli Adattatori

Abstract

Per insegnare ai robot compiti di manipolazione complessi, è ormai pratica comune effettuare il fine-tuning di un modello visione-linguaggio-azione (VLA) pre-addestrato su dati specifici del compito. Tuttavia, poiché questa procedura aggiorna le rappresentazioni esistenti, non è adatta per un funzionamento a lungo termine nel mondo reale, dove i robot devono adattarsi continuamente a nuovi compiti e ambienti, conservando al contempo le conoscenze già acquisite. I metodi di apprendimento continuo esistenti per la robotica richiedono comunemente la memorizzazione di dati precedenti (esemplari), faticano con sequenze lunghe di compiti o si basano su identificatori di compito per il deployment. Per affrontare queste limitazioni, proponiamo CLARE, un framework generale ed efficiente in termini di parametri per l'apprendimento continuo senza esemplari con modelli VLA. CLARE introduce adattatori modulari leggeri in strati feedforward selezionati ed espande autonomamente il modello solo dove necessario durante l'apprendimento di un nuovo compito, guidato dalla similarità delle caratteristiche a livello di strato. Durante il deployment, un meccanismo di instradamento basato su autoencoder attiva dinamicamente gli adattatori più rilevanti senza richiedere etichette dei compiti. Attraverso esperimenti estesi sul benchmark LIBERO, dimostriamo che CLARE raggiunge alte prestazioni su nuovi compiti senza dimenticanza catastrofica dei compiti precedenti, superando significativamente anche i metodi basati su esemplari. Il codice e i dati sono disponibili all'indirizzo https://tum-lsy.github.io/clare.

English

To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods. Code and data are available at https://tum-lsy.github.io/clare.

CLARE: Apprendimento Continuo per Modelli Visione-Linguaggio-Azione tramite Instradamento ed Espansione Autonoma degli Adattatori

CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

Abstract

Support