가중치 공간 메타 학습을 통한 로봇 정책 적응

초록

비전-언어-행동(VLA) 모델은 로봇 조작을 위한 유망한 패러다임으로 떠오르고 있으며, 대규모 데모 및 행동 레이블 코퍼스로 학습된 범용 정책을 가능하게 한다. 그러나 이러한 모델을 새로운 작업에 적용하려면 여전히 일반적으로 작업별 데모, 행동 주석 및 추가 미세 조정이 필요하므로 배포 비용이 많이 들고 확장이 어렵다. 우리는 WIZARD를 제안한다. 이는 가중치 공간 메타 학습 프레임워크로, 고정된 VLA 정책에 대한 작업별 LoRA 파라미터를 생성하여 작업별 미세 조정을 우회한다. 언어 명령과 짧은 데모 비디오만 주어지면, WIZARD는 대상 작업의 행동 레이블이나 테스트 시 최적화 없이 단일 순방향 패스에서 해당 적응 가중치를 예측한다. 메타 훈련 중에 WIZARD는 작업 증거를 전문가 LoRA 업데이트에 직접 매핑하는 방법을 학습하며, 가중치 공간에서 작업 간의 관계를 포착한다. LIBERO 실험 결과, WIZARD는 보지 못한 데이터셋 컬렉션에서 최대 약 2배, 보지 못한 작업에서 최대 약 14배까지 성능을 향상시킨다. Franka Emika Panda 로봇에서 WIZARD는 실제 도메인에 적응된 기준선보다 일관되게 향상된 성능을 보여주며, 생성된 어댑터가 시뮬레이션을 넘어 작업 수준의 특화를 제공함을 입증한다.

English

Vision-Language-Action (VLA) models are emerging as a promising paradigm for robotic manipulation, enabling general-purpose policies trained from large corpora of demonstrations and action labels. However, adapting these models to new tasks still typically requires task-specific demonstrations, action annotations, and additional fine-tuning, making deployment costly and difficult to scale. We propose WIZARD, a weight-space meta-learning framework that sidesteps task-specific fine-tuning by generating task-specific LoRA parameters for a frozen VLA policy. Given only a language instruction and a short demonstration video, WIZARD predicts the corresponding adaptation weights in a single forward pass, without target-task action labels or test-time optimization. During meta-training, WIZARD learns to map task evidence directly to expert LoRA updates, capturing relationships between tasks in weight space. Experiments on LIBERO show that WIZARD improves performance by up to ~2x on unseen dataset collections and up to ~14x on unseen tasks. On a Franka Emika Panda, WIZARD consistently improves over a real-domain adapted baseline, showing that generated adapters provide task-level specialization beyond simulation.