透過權重空間元學習的機器人策略適應

摘要

視覺-語言-動作（VLA）模型正逐漸成為機器人操作領域中極具前景的典範，使我們能透過大量示範資料與動作標籤訓練出通用策略。然而，將這些模型適應至新任務時，通常仍需要任務專屬的示範資料、動作標註以及額外的微調，導致部署成本高昂且難以擴展。我們提出WIZARD，一個基於權重空間的元學習框架，能夠透過為凍結的VLA策略生成任務專屬的LoRA參數，從而繞過任務專屬的微調步驟。僅需一條語言指令與一段簡短的示範影片，WIZARD即可在單次前向傳遞中預測出對應的適應權重，無需目標任務的動作標籤或測試時的最佳化。在元訓練階段，WIZARD學習將任務證據直接映射為專家等級的LoRA更新，並在權重空間中捕捉任務之間的關聯性。在LIBERO上的實驗結果顯示，WIZARD在未見過的資料集組合上效能提升達約2倍，而在未見過的任務上則提升達約14倍。在Franka Emika Panda機器人上，WIZARD始終優於經過真實域適應的基線模型，證明所生成的適配器能提供超越模擬環境的任務層級專精化。

English

Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space. Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.