ChatPaper.aiChatPaper

強化學習 + 轉換器 = 通用問題解決器

RL + Transformer = A General-Purpose Problem Solver

January 24, 2025
作者: Micah Rentschler, Jesse Roberts
cs.AI

摘要

假設人工智慧不僅可以解決其接受訓練的問題,還能學會自我教導以解決新問題(即元學習),會怎樣呢?在這項研究中,我們展示了通過多個情節上進行強化學習微調的預訓練變壓器發展出解決從未遇到過的問題的能力 - 一種稱為「上下文強化學習」(ICRL)的新興能力。這種強大的元學習器不僅在解決未見過的分布內環境時表現出色並具有顯著的樣本效率,還在分布外環境中表現出色。此外,我們展示它對訓練數據質量的韌性,無縫地將其上下文中的行為結合在一起,並適應非穩態環境。這些行為表明,通過強化學習訓練的變壓器可以逐步改進自己的解決方案,使其成為一個優秀的通用問題解決器。
English
What if artificial intelligence could not only solve problems for which it was trained but also learn to teach itself to solve new problems (i.e., meta-learn)? In this study, we demonstrate that a pre-trained transformer fine-tuned with reinforcement learning over multiple episodes develops the ability to solve problems that it has never encountered before - an emergent ability called In-Context Reinforcement Learning (ICRL). This powerful meta-learner not only excels in solving unseen in-distribution environments with remarkable sample efficiency, but also shows strong performance in out-of-distribution environments. In addition, we show that it exhibits robustness to the quality of its training data, seamlessly stitches together behaviors from its context, and adapts to non-stationary environments. These behaviors demonstrate that an RL-trained transformer can iteratively improve upon its own solutions, making it an excellent general-purpose problem solver.

Summary

AI-Generated Summary

PDF282January 27, 2025