重み空間メタ学習によるロボットポリシー適応

要旨

ビジョン・ランゲージ・アクション（VLA）モデルは、ロボット操作の有望なパラダイムとして台頭しており、大規模な実演データセットと行動ラベルから学習した汎用ポリシーを実現している。しかし、これらのモデルを新しいタスクに適応させるには、依然としてタスク固有の実演データ、行動アノテーション、追加のファインチューニングが必要であり、展開コストが高く、スケールが難しい。本稿では、重み空間メタ学習フレームワークであるWIZARDを提案する。これは、凍結されたVLAポリシーに対してタスク固有のLoRAパラメータを生成することで、タスク固有のファインチューニングを回避する。言語指示と短い実演動画のみを与えられると、WIZARDはターゲットタスクの行動ラベルやテスト時の最適化を必要とせずに、単一の順伝搬で対応する適応重みを予測する。メタ学習中、WIZARDはタスクの証拠を直接エキスパートLoRA更新へとマッピングする方法を学習し、タスク間の関係を重み空間で捉える。 LIBEROを用いた実験では、WIZARDは未見のデータセット群で最大約2倍、未見のタスクで最大約14倍の性能向上を示した。Franka Emika Pandaロボットでは、WIZARDが実環境適応ベースラインに対して一貫した改善を示し、生成されたアダプタがシミュレーションを超えたタスクレベルの特化を提供することを示している。

English

Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space. Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.