种子策略：基于自演进扩散策略的机器人操作水平扩展

摘要

模仿學習（IL）能使機器人透過專家示範掌握操作技能。擴散策略（DP）雖能建模多模態專家行為，但其性能會隨觀測時域的延長而衰減，限制了長時域操作能力。本文提出自演化門控注意力（SEGA）時序模塊，該模塊通過門控注意力機制維持隨時間演化的潛在狀態，實現高效的遞歸更新：既能將長時域觀測壓縮為固定尺寸表徵，又可濾除無關時序信息。將SEGA整合至DP形成自演化擴散策略（SeedPolicy），該方法突破了時序建模瓶頸，能以適中開銷實現可擴展的時域延伸。在包含50項操作任務的RoboTwin 2.0基準測試中，SeedPolicy優於DP及其他IL基線模型。在CNN與Transformer雙骨幹網絡的綜合評測下，SeedPolicy在標準設定中相對DP實現36.8%的性能提升，在隨機化挑戰設定中提升幅度達169%。相較於擁有12億參數的視覺-語言-動作模型（如RDT），SeedPolicy以少一至兩個數量級的參數量取得相當性能，展現出卓越的效能與可擴展性。這些成果確立SeedPolicy作為長時域機器人操作領域的頂尖模仿學習方法。代碼已開源於：https://github.com/Youqiang-Gui/SeedPolicy。

English

Imitation Learning (IL) enables robots to acquire manipulation skills from expert demonstrations. Diffusion Policy (DP) models multi-modal expert behaviors but suffers performance degradation as observation horizons increase, limiting long-horizon manipulation. We propose Self-Evolving Gated Attention (SEGA), a temporal module that maintains a time-evolving latent state via gated attention, enabling efficient recurrent updates that compress long-horizon observations into a fixed-size representation while filtering irrelevant temporal information. Integrating SEGA into DP yields Self-Evolving Diffusion Policy (SeedPolicy), which resolves the temporal modeling bottleneck and enables scalable horizon extension with moderate overhead. On the RoboTwin 2.0 benchmark with 50 manipulation tasks, SeedPolicy outperforms DP and other IL baselines. Averaged across both CNN and Transformer backbones, SeedPolicy achieves 36.8% relative improvement in clean settings and 169% relative improvement in randomized challenging settings over the DP. Compared to vision-language-action models such as RDT with 1.2B parameters, SeedPolicy achieves competitive performance with one to two orders of magnitude fewer parameters, demonstrating strong efficiency and scalability. These results establish SeedPolicy as a state-of-the-art imitation learning method for long-horizon robotic manipulation. Code is available at: https://github.com/Youqiang-Gui/SeedPolicy.

种子策略：基于自演进扩散策略的机器人操作水平扩展

SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation

摘要

Support