ボディトランスフォーマー：ロボットの身体性を活用したポリシー学習

要旨

近年、Transformerアーキテクチャは自然言語処理やコンピュータビジョンに適用される機械学習アルゴリズムのデファクトスタンダードとなっている。ロボット学習の文脈においてもこのアーキテクチャの成功例が報告されているが、我々は、従来のTransformerがロボット学習問題の構造を十分に活用していないと主張する。そこで我々は、ロボットの身体性を活用し、学習プロセスを導く帰納的バイアスを提供するBody Transformer（BoT）アーキテクチャを提案する。ロボットの身体をセンサーとアクチュエータのグラフとして表現し、マスクドアテンションを用いてアーキテクチャ全体で情報を集約する。その結果、模倣学習や強化学習のポリシーを表現する際に、BoTアーキテクチャは従来のTransformerや古典的な多層パーセプトロンを上回るタスク達成率、スケーリング特性、計算効率を示す。オープンソースコードを含む追加資料はhttps://sferrazza.cc/bot_siteで公開されている。

English

In recent years, the transformer architecture has become the de facto standard for machine learning algorithms applied to natural language processing and computer vision. Despite notable evidence of successful deployment of this architecture in the context of robot learning, we claim that vanilla transformers do not fully exploit the structure of the robot learning problem. Therefore, we propose Body Transformer (BoT), an architecture that leverages the robot embodiment by providing an inductive bias that guides the learning process. We represent the robot body as a graph of sensors and actuators, and rely on masked attention to pool information throughout the architecture. The resulting architecture outperforms the vanilla transformer, as well as the classical multilayer perceptron, in terms of task completion, scaling properties, and computational efficiency when representing either imitation or reinforcement learning policies. Additional material including the open-source code is available at https://sferrazza.cc/bot_site.

ボディトランスフォーマー：ロボットの身体性を活用したポリシー学習

Body Transformer: Leveraging Robot Embodiment for Policy Learning

要旨

Support