盤古代理人:具結構推理的可微調通用代理人
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
December 22, 2023
作者: Filippos Christianos, Georgios Papoudakis, Matthieu Zimmer, Thomas Coste, Zhihao Wu, Jingxuan Chen, Khyati Khandelwal, James Doran, Xidong Feng, Jiacheng Liu, Zheng Xiong, Yicheng Luo, Jianye Hao, Kun Shao, Haitham Bou-Ammar, Jun Wang
cs.AI
摘要
創建人工智慧(AI)代理的關鍵方法之一是強化學習(RL)。然而,構建一個獨立的RL策略,直接將知覺映射到行動,會遇到嚴重問題,其中最主要的問題是在多個任務之間缺乏通用性,以及需要大量的訓練數據。主要原因是在制定策略時無法有效地將先前信息整合到知覺-行動循環中。大型語言模型(LLMs)作為將跨領域知識融入AI代理的基本方法出現,但缺乏對特定決策問題的重要學習和適應。本文提出了一個通用框架模型,用於將結構化推理整合並學習到AI代理的策略中。我們的方法受到人類大腦中的模塊化發現的啟發。該框架利用構建內在和外在功能來添加對推理結構的先前理解。它還提供了在每個模塊或功能內學習模型的適應能力,符合認知過程的模塊化結構。我們深入描述了該框架並將其與其他AI流程和現有框架進行了比較。本文探討了實際應用,包括實驗,展示了我們方法的有效性。我們的結果表明,當組織推理和先前知識嵌入時,AI代理的表現和適應能力更好。這為更具彈性和通用性的AI代理系統打開了大門。
English
A key method for creating Artificial Intelligence (AI) agents is
Reinforcement Learning (RL). However, constructing a standalone RL policy that
maps perception to action directly encounters severe problems, chief among them
being its lack of generality across multiple tasks and the need for a large
amount of training data. The leading cause is that it cannot effectively
integrate prior information into the perception-action cycle when devising the
policy. Large language models (LLMs) emerged as a fundamental way to
incorporate cross-domain knowledge into AI agents but lack crucial learning and
adaptation toward specific decision problems. This paper presents a general
framework model for integrating and learning structured reasoning into AI
agents' policies. Our methodology is motivated by the modularity found in the
human brain. The framework utilises the construction of intrinsic and extrinsic
functions to add previous understandings of reasoning structures. It also
provides the adaptive ability to learn models inside every module or function,
consistent with the modular structure of cognitive processes. We describe the
framework in-depth and compare it with other AI pipelines and existing
frameworks. The paper explores practical applications, covering experiments
that show the effectiveness of our method. Our results indicate that AI agents
perform and adapt far better when organised reasoning and prior knowledge are
embedded. This opens the door to more resilient and general AI agent systems.