盤古代理人：具結構推理的可微調通用代理人

摘要

創建人工智慧（AI）代理的關鍵方法之一是強化學習（RL）。然而，構建一個獨立的RL策略，直接將知覺映射到行動，會遇到嚴重問題，其中最主要的問題是在多個任務之間缺乏通用性，以及需要大量的訓練數據。主要原因是在制定策略時無法有效地將先前信息整合到知覺-行動循環中。大型語言模型（LLMs）作為將跨領域知識融入AI代理的基本方法出現，但缺乏對特定決策問題的重要學習和適應。本文提出了一個通用框架模型，用於將結構化推理整合並學習到AI代理的策略中。我們的方法受到人類大腦中的模塊化發現的啟發。該框架利用構建內在和外在功能來添加對推理結構的先前理解。它還提供了在每個模塊或功能內學習模型的適應能力，符合認知過程的模塊化結構。我們深入描述了該框架並將其與其他AI流程和現有框架進行了比較。本文探討了實際應用，包括實驗，展示了我們方法的有效性。我們的結果表明，當組織推理和先前知識嵌入時，AI代理的表現和適應能力更好。這為更具彈性和通用性的AI代理系統打開了大門。

English

A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. Large language models (LLMs) emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.

盤古代理人：具結構推理的可微調通用代理人

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

摘要

Support