用于预测、表示和控制的遮蔽轨迹模型

摘要

我们引入了Masked Trajectory Models (MTM) 作为顺序决策制定的通用抽象。MTM接受轨迹，比如状态-动作序列，并旨在在相同轨迹的随机子集条件下重建轨迹。通过使用高度随机化的遮罩模式进行训练，MTM学习到可以在推断时通过简单选择适当的遮罩来扮演不同角色或具备不同能力的多功能网络。例如，同一MTM网络可以用作前向动力学模型、逆向动力学模型，甚至是离线RL代理。通过在多个连续控制任务中进行大量实验，我们展示了相同的MTM网络 -- 即相同的权重 -- 可以与为上述能力训练的专门网络相匹敌甚至胜过。此外，我们发现MTM学习的状态表示可以显著加快传统RL算法的学习速度。最后，在离线RL基准测试中，我们发现MTM与专门的离线RL算法相媲美，尽管MTM是一种通用的自监督学习方法，没有任何显式的RL组件。代码可在https://github.com/facebookresearch/mtm 找到。

English

We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making. MTM takes a trajectory, such as a state-action sequence, and aims to reconstruct the trajectory conditioned on random subsets of the same trajectory. By training with a highly randomized masking pattern, MTM learns versatile networks that can take on different roles or capabilities, by simply choosing appropriate masks at inference time. For example, the same MTM network can be used as a forward dynamics model, inverse dynamics model, or even an offline RL agent. Through extensive experiments in several continuous control tasks, we show that the same MTM network -- i.e. same weights -- can match or outperform specialized networks trained for the aforementioned capabilities. Additionally, we find that state representations learned by MTM can significantly accelerate the learning speed of traditional RL algorithms. Finally, in offline RL benchmarks, we find that MTM is competitive with specialized offline RL algorithms, despite MTM being a generic self-supervised learning method without any explicit RL components. Code is available at https://github.com/facebookresearch/mtm

用于预测、表示和控制的遮蔽轨迹模型

Masked Trajectory Models for Prediction, Representation, and Control

摘要

Support