基于权重空间元学习的机器人策略适应

摘要

视觉-语言-动作（VLA）模型正成为机器人操作领域的一种有前景的范式，使得从海量演示数据和动作标签中训练通用策略成为可能。然而，将这些模型适配到新任务通常仍需任务特定的演示、动作标注及额外微调，导致部署成本高昂且难以规模化扩展。我们提出WIZARD，一种基于权重空间的元学习框架，通过为冻结的VLA策略生成任务特定的LoRA参数，规避了任务特定微调。仅需一条语言指令和一段简短演示视频，WIZARD即可在单次前向传播中预测相应的适配权重，无需目标任务动作标签或测试期优化。在元训练阶段，WIZARD学习将任务证据直接映射为专家级LoRA更新，从而在权重空间中捕捉任务间的关联关系。在LIBERO基准上的实验表明，WIZARD在未见数据集集合上的性能提升最高达约2倍，在未见任务上最高达约14倍。在Franka Emika Panda机器人上的实验进一步证实，WIZARD在真实域适配基线上持续取得改进，证明生成的适配器能够提供超越仿真场景的任务级专业化能力。

English

Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space. Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.