ChatPaper.aiChatPaper

Premier-TACO:透過時間動作驅動對比損失預訓練多任務表示

Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

February 9, 2024
作者: Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang
cs.AI

摘要

我們提出了Premier-TACO,一種多任務特徵表示學習方法,旨在提高在連續決策任務中的少樣本策略學習效率。Premier-TACO利用多任務離線數據集的子集來預訓練通用特徵表示,捕捉關鍵的環境動態,並使用最少的專家示範進行微調。它推進了時間動作對比學習(TACO)目標,該目標在視覺控制任務中以最先進的結果聞名,並納入了一種新穎的負例採樣策略。這種策略對於顯著提升TACO的計算效率至關重要,使大規模多任務離線預訓練成為可能。我們在各種連續控制基準測試中進行了廣泛的實證評估,包括Deepmind Control Suite、MetaWorld和LIBERO,展示了Premier-TACO在預訓練視覺表示方面的有效性,顯著增強了對新任務的少樣本模仿學習。我們的代碼、預訓練數據以及預訓練模型檢查點將在https://github.com/PremierTACO/premier-taco上發布。
English
We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the temporal action contrastive learning (TACO) objective, known for state-of-the-art results in visual control tasks, by incorporating a novel negative example sampling strategy. This strategy is crucial in significantly boosting TACO's computational efficiency, making large-scale multitask offline pretraining feasible. Our extensive empirical evaluation in a diverse set of continuous control benchmarks including Deepmind Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectiveness in pretraining visual representations, significantly enhancing few-shot imitation learning of novel tasks. Our code, pretraining data, as well as pretrained model checkpoints will be released at https://github.com/PremierTACO/premier-taco.
PDF112December 15, 2024