RIG: エンドツーエンド汎用ポリシーにおける推論と想像力のシナジー効果

要旨

複雑なオープンワールド環境で動作するエンボディードエージェントにとって、行動前の推論と潜在的な結果の想像（すなわち、世界モデル）は不可欠である。しかし、従来の研究では、エンドツーエンドのエージェントにこれらの能力のいずれか一方のみを組み込むか、あるいは複数の専門化されたモデルをエージェントシステムに統合するにとどまり、ポリシーの学習効率と汎化性能が制限されていた。そこで本論文では、推論と想像をエンドツーエンドのジェネラリストポリシー（RIGと称する）に統合する初めての試みを行う。RIGをエンドツーエンドで訓練するために、既存のエージェントから収集した軌跡において、想像と推論の内容を段階的に統合・充実させるデータパイプラインを構築する。推論と次の画像生成の共同学習は、推論、行動、環境のダイナミクス間の内在的な相関を明示的にモデル化し、従来の研究と比較して17倍以上のサンプル効率の向上と汎化性能を示す。推論時には、RIGはまず次の行動を推論し、潜在的な行動を生成し、その後行動の結果を予測する。これにより、エージェントは実際の行動を取る前に想像に基づいてレビューと自己修正を行う機会を得る。実験結果は、推論と想像の統合がジェネラリストポリシーの頑健性、汎化性能、相互運用性を向上させるだけでなく、テスト時のスケーリングによって全体の性能を向上させることも可能にすることを示している。

English

Reasoning before action and imagining potential outcomes (i.e., world models) are essential for embodied agents operating in complex open-world environments. Yet, prior work either incorporates only one of these abilities in an end-to-end agent or integrates multiple specialized models into an agent system, limiting the learning efficiency and generalization of the policy. Thus, this paper makes the first attempt to synergize Reasoning and Imagination in an end-to-end Generalist policy, termed RIG. To train RIG in an end-to-end manner, we construct a data pipeline that progressively integrates and enriches the content of imagination and reasoning in the trajectories collected from existing agents. The joint learning of reasoning and next image generation explicitly models the inherent correlation between reasoning, action, and dynamics of environments, and thus exhibits more than 17times sample efficiency improvements and generalization in comparison with previous works. During inference, RIG first reasons about the next action, produces potential action, and then predicts the action outcomes, which offers the agent a chance to review and self-correct based on the imagination before taking real actions. Experimental results show that the synergy of reasoning and imagination not only improves the robustness, generalization, and interoperability of generalist policy but also enables test-time scaling to enhance overall performance.

RIG: エンドツーエンド汎用ポリシーにおける推論と想像力のシナジー効果

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy

要旨

Support