自適應改進循環於機器學習中的應用
Self-Adapting Improvement Loops for Robotic Learning
June 7, 2025
作者: Calvin Luo, Zilai Zeng, Mingxi Jia, Yilun Du, Chen Sun
cs.AI
摘要
基於專家示範訓練的視頻生成模型已被用作高性能的文本條件視覺規劃器,用於解決機器人任務。然而,泛化到未見任務仍然是一個挑戰。雖然通過利用從額外預先收集的離線數據源(如網絡規模的視頻數據集)中學習到的先驗知識,可能促進泛化能力的提升,但在經驗時代,我們旨在設計能夠從自我收集的行為中持續在線改進的智能體。因此,在本工作中,我們提出了自我適應改進循環(SAIL),其中域內視頻模型在自我產生的軌跡上迭代更新,這些軌跡通過與互聯網規模預訓練視頻模型的適應收集而來,並穩步提升其在指定感興趣任務上的表現。我們將SAIL應用於多樣化的MetaWorld任務集,以及真實機械臂上的兩個操作任務,發現對於最初在域內視頻模型訓練中未見的新任務,性能在多輪迭代中持續提升。此外,我們發現SAIL在自我收集經驗是否及如何過濾,以及初始域內示範的質量方面,表現出驚人的魯棒性。通過與總結的互聯網規模數據的適應,以及在線經驗的學習,我們展示了一種通過自我改進迭代引導高性能視頻模型解決新機器人任務的方法。
English
Video generative models trained on expert demonstrations have been utilized
as performant text-conditioned visual planners for solving robotic tasks.
However, generalization to unseen tasks remains a challenge. Whereas improved
generalization may be facilitated by leveraging learned prior knowledge from
additional pre-collected offline data sources, such as web-scale video
datasets, in the era of experience we aim to design agents that can
continuously improve in an online manner from self-collected behaviors. In this
work we thus propose the Self-Adapting Improvement Loop (SAIL), where an
in-domain video model iteratively updates itself on self-produced trajectories,
collected through adaptation with an internet-scale pretrained video model, and
steadily improves its performance for a specified task of interest. We apply
SAIL to a diverse suite of MetaWorld tasks, as well as two manipulation tasks
on a real robot arm, and find that performance improvements continuously emerge
over multiple iterations for novel tasks initially unseen during original
in-domain video model training. Furthermore, we discover that SAIL is
surprisingly robust regarding if and how the self-collected experience is
filtered, and the quality of the initial in-domain demonstrations. Through
adaptation with summarized internet-scale data, and learning through online
experience, we thus demonstrate a way to iteratively bootstrap a
high-performance video model for solving novel robotic tasks through
self-improvement.