Premier-TACO: 시간적 행동 기반 대조 손실을 통한 다중 작업 표현 사전 학습

초록

우리는 순차적 의사결정 과제에서 소수 샷(few-shot) 정책 학습 효율성을 향상시키기 위해 설계된 다중 작업(multitask) 특징 표현 학습 접근법인 Premier-TACO를 제안합니다. Premier-TACO는 다중 작업 오프라인 데이터셋의 부분집합을 활용하여 일반적인 특징 표현을 사전 학습하며, 이는 중요한 환경 역학을 포착하고 최소한의 전문가 시연 데이터를 사용하여 미세 조정됩니다. 이 방법은 시각적 제어 과제에서 최첨단 성과를 보인 시간적 행동 대조 학습(Temporal Action Contrastive Learning, TACO) 목적 함수를 발전시켜, 새로운 부정 예제 샘플링 전략을 통합합니다. 이 전략은 TACO의 계산 효율성을 크게 향상시키는 데 핵심적이며, 대규모 다중 작업 오프라인 사전 학습을 가능하게 합니다. Deepmind Control Suite, MetaWorld, LIBERO 등 다양한 연속 제어 벤치마크에서의 광범위한 실험적 평가를 통해 Premier-TACO가 시각적 표현 사전 학습에 효과적이며, 새로운 과제의 소수 샷 모방 학습을 크게 개선함을 입증했습니다. 우리의 코드, 사전 학습 데이터, 그리고 사전 학습된 모델 체크포인트는 https://github.com/PremierTACO/premier-taco에서 공개될 예정입니다.

English

We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the temporal action contrastive learning (TACO) objective, known for state-of-the-art results in visual control tasks, by incorporating a novel negative example sampling strategy. This strategy is crucial in significantly boosting TACO's computational efficiency, making large-scale multitask offline pretraining feasible. Our extensive empirical evaluation in a diverse set of continuous control benchmarks including Deepmind Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectiveness in pretraining visual representations, significantly enhancing few-shot imitation learning of novel tasks. Our code, pretraining data, as well as pretrained model checkpoints will be released at https://github.com/PremierTACO/premier-taco.

Premier-TACO: 시간적 행동 기반 대조 손실을 통한 다중 작업 표현 사전 학습

Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

초록

Support