Premier-TACO:通过时间动作驱动对比损失进行预训练的多任务表示
Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss
February 9, 2024
作者: Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang
cs.AI
摘要
我们提出Premier-TACO,这是一种多任务特征表示学习方法,旨在提高序贯决策任务中少样本策略学习效率。Premier-TACO利用多任务离线数据集的子集进行预训练通用特征表示,捕捉关键的环境动态,并使用最少的专家演示进行微调。它推进了时间动作对比学习(TACO)目标,该目标在视觉控制任务中以最先进的结果闻名,通过引入一种新颖的负例采样策略。这种策略对于显著提升TACO的计算效率至关重要,使大规模多任务离线预训练成为可能。我们在包括Deepmind Control Suite、MetaWorld和LIBERO在内的多样的连续控制基准测试中进行了广泛的实证评估,展示了Premier-TACO在预训练视觉表示方面的有效性,显著提升了对新任务的少样本模仿学习。我们的代码、预训练数据以及预训练模型检查点将在https://github.com/PremierTACO/premier-taco 上发布。
English
We present Premier-TACO, a multitask feature representation learning approach
designed to improve few-shot policy learning efficiency in sequential
decision-making tasks. Premier-TACO leverages a subset of multitask offline
datasets for pretraining a general feature representation, which captures
critical environmental dynamics and is fine-tuned using minimal expert
demonstrations. It advances the temporal action contrastive learning (TACO)
objective, known for state-of-the-art results in visual control tasks, by
incorporating a novel negative example sampling strategy. This strategy is
crucial in significantly boosting TACO's computational efficiency, making
large-scale multitask offline pretraining feasible. Our extensive empirical
evaluation in a diverse set of continuous control benchmarks including Deepmind
Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectiveness
in pretraining visual representations, significantly enhancing few-shot
imitation learning of novel tasks. Our code, pretraining data, as well as
pretrained model checkpoints will be released at
https://github.com/PremierTACO/premier-taco.