基于权重空间元学习的机器人策略适应
Robotic Policy Adaptation via Weight-Space Meta-Learning
June 5, 2026
作者: Christian Bianchi, Siamak Yousefi, Alessio Sampieri, Andrea Roberti, Luca Rigazio, Fabio Galasso, Luca Franco
cs.AI
摘要
视觉-语言-动作(VLA)模型正成为机器人操作领域的一种有前景的范式,使得从海量演示数据和动作标签中训练通用策略成为可能。然而,将这些模型适配到新任务通常仍需任务特定的演示、动作标注及额外微调,导致部署成本高昂且难以规模化扩展。
我们提出WIZARD,一种基于权重空间的元学习框架,通过为冻结的VLA策略生成任务特定的LoRA参数,规避了任务特定微调。仅需一条语言指令和一段简短演示视频,WIZARD即可在单次前向传播中预测相应的适配权重,无需目标任务动作标签或测试期优化。在元训练阶段,WIZARD学习将任务证据直接映射为专家级LoRA更新,从而在权重空间中捕捉任务间的关联关系。
在LIBERO基准上的实验表明,WIZARD在未见数据集集合上的性能提升最高达约2倍,在未见任务上最高达约14倍。在Franka Emika Panda机器人上的实验进一步证实,WIZARD在真实域适配基线上持续取得改进,证明生成的适配器能够提供超越仿真场景的任务级专业化能力。
English
Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale.
We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space.
Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.