透過權重空間元學習的機器人策略適應
Robotic Policy Adaptation via Weight-Space Meta-Learning
June 5, 2026
作者: Christian Bianchi, Siamak Yousefi, Alessio Sampieri, Andrea Roberti, Luca Rigazio, Fabio Galasso, Luca Franco
cs.AI
摘要
視覺-語言-動作(VLA)模型正逐漸成為機器人操作領域中極具前景的典範,使我們能透過大量示範資料與動作標籤訓練出通用策略。然而,將這些模型適應至新任務時,通常仍需要任務專屬的示範資料、動作標註以及額外的微調,導致部署成本高昂且難以擴展。
我們提出WIZARD,一個基於權重空間的元學習框架,能夠透過為凍結的VLA策略生成任務專屬的LoRA參數,從而繞過任務專屬的微調步驟。僅需一條語言指令與一段簡短的示範影片,WIZARD即可在單次前向傳遞中預測出對應的適應權重,無需目標任務的動作標籤或測試時的最佳化。在元訓練階段,WIZARD學習將任務證據直接映射為專家等級的LoRA更新,並在權重空間中捕捉任務之間的關聯性。
在LIBERO上的實驗結果顯示,WIZARD在未見過的資料集組合上效能提升達約2倍,而在未見過的任務上則提升達約14倍。在Franka Emika Panda機器人上,WIZARD始終優於經過真實域適應的基線模型,證明所生成的適配器能提供超越模擬環境的任務層級專精化。
English
Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale.
We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space.
Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.