SeedPolicy: 自己進化する拡散ポリシーによるロボットマニピュレーションのための地平線スケーリング

要旨

模倣学習（IL）は、ロボットが専門家のデモンストレーションから操作スキルを獲得することを可能にする。Diffusion Policy（DP）はマルチモーダルな専門家の行動をモデル化するが、観測ホライズンが長くなるほど性能が低下し、長期的な操作の課題となっている。本研究では、Self-Evolving Gated Attention（SEGA）を提案する。これは、ゲート付きアテンションを通じて時間発展する潜在状態を維持する時間モジュールであり、長期的な観測を固定サイズの表現に圧縮しつつ無関係な時間情報をフィルタリングする、効率的なリカレント更新を実現する。SEGAをDPに統合したSelf-Evolving Diffusion Policy（SeedPolicy）は、時間モデリングのボトルネックを解決し、適度なオーバーヘッドでスケーラブルなホライズン拡張を可能にする。50の操作タスクを含むRoboTwin 2.0ベンチマークにおいて、SeedPolicyはDPおよびその他のILベースライン手法を上回った。CNNとTransformerの両バックボーン平均では、SeedPolicyはDPに対し、標準設定で36.8%、ランダム化された困難設定で169%の相対的改善を達成した。12億パラメータを持つRDTなどの視覚言語行動モデルと比較して、SeedPolicyは1～2桁少ないパラメータ数で同等の性能を達成し、優れた効率性とスケーラビリティを示した。これらの結果は、SeedPolicyが長期的ロボット操作における最先端の模倣学習手法であることを立証する。コードはhttps://github.com/Youqiang-Gui/SeedPolicy で公開されている。

English

Imitation Learning (IL) enables robots to acquire manipulation skills from expert demonstrations. Diffusion Policy (DP) models multi-modal expert behaviors but suffers performance degradation as observation horizons increase, limiting long-horizon manipulation. We propose Self-Evolving Gated Attention (SEGA), a temporal module that maintains a time-evolving latent state via gated attention, enabling efficient recurrent updates that compress long-horizon observations into a fixed-size representation while filtering irrelevant temporal information. Integrating SEGA into DP yields Self-Evolving Diffusion Policy (SeedPolicy), which resolves the temporal modeling bottleneck and enables scalable horizon extension with moderate overhead. On the RoboTwin 2.0 benchmark with 50 manipulation tasks, SeedPolicy outperforms DP and other IL baselines. Averaged across both CNN and Transformer backbones, SeedPolicy achieves 36.8% relative improvement in clean settings and 169% relative improvement in randomized challenging settings over the DP. Compared to vision-language-action models such as RDT with 1.2B parameters, SeedPolicy achieves competitive performance with one to two orders of magnitude fewer parameters, demonstrating strong efficiency and scalability. These results establish SeedPolicy as a state-of-the-art imitation learning method for long-horizon robotic manipulation. Code is available at: https://github.com/Youqiang-Gui/SeedPolicy.

SeedPolicy: 自己進化する拡散ポリシーによるロボットマニピュレーションのための地平線スケーリング

SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation

要旨

Support