DynaMo: ビジュオモーターコントロールのためのドメイン内ダイナミクス事前学習

要旨

模倣学習は、複雑な視覚運動ポリシーを訓練するための強力なツールであることが証明されています。しかし、現在の方法では、高次元の視覚観測を処理するために数百から数千の専門家によるデモが必要とされることがよくあります。このデータ効率の悪さの主な理由は、視覚表現が主にドメイン外のデータで事前にトレーニングされているか、または行動クローン目的で直接トレーニングされていることです。本研究では、視覚表現を学習するための新しいドメイン内の自己教師付き方法であるDynaMoを提案します。専門家によるデモのセットを与えられた場合、我々は画像埋め込みのシーケンス上で次のフレームを潜在空間で予測する逆動力学モデルと前方動力学モデルを共同で学習します。この際、拡張や対照的なサンプリング、または正解アクションへのアクセスは必要ありません。重要なことは、DynaMoはインターネットデータセットやクロスエンボディデータセットなどのドメイン外データを必要としないという点です。6つのシミュレートおよび実環境のスイートで、DynaMoで学習された表現が、従来の自己教師付き学習目的や事前トレーニングされた表現よりも明らかに後段の模倣学習パフォーマンスを向上させることを示します。DynaMoの使用による利点は、Behavior Transformer、Diffusion Policy、MLP、および最近傍などのポリシークラス全体にわたって維持されます。最後に、DynaMoの主要なコンポーネントについて検証し、後段のポリシーパフォーマンスへの影響を評価します。ロボットのビデオは、https://dynamo-ssl.github.io で最もよく表示されます。

English

Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. A key reason for this poor data efficiency is that visual representations are predominantly either pretrained on out-of-domain data or trained directly through a behavior cloning objective. In this work, we present DynaMo, a new in-domain, self-supervised method for learning visual representations. Given a set of expert demonstrations, we jointly learn a latent inverse dynamics model and a forward dynamics model over a sequence of image embeddings, predicting the next frame in latent space, without augmentations, contrastive sampling, or access to ground truth actions. Importantly, DynaMo does not require any out-of-domain data such as Internet datasets or cross-embodied datasets. On a suite of six simulated and real environments, we show that representations learned with DynaMo significantly improve downstream imitation learning performance over prior self-supervised learning objectives, and pretrained representations. Gains from using DynaMo hold across policy classes such as Behavior Transformer, Diffusion Policy, MLP, and nearest neighbors. Finally, we ablate over key components of DynaMo and measure its impact on downstream policy performance. Robot videos are best viewed at https://dynamo-ssl.github.io

DynaMo: ビジュオモーターコントロールのためのドメイン内ダイナミクス事前学習

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

要旨

Support