盘古智能体：具有结构化推理能力的可微调通用智能体

摘要

创建人工智能（AI）代理的关键方法之一是强化学习（RL）。然而，构建一个独立的RL策略，直接将感知映射到行动中，会遇到严重问题，其中最主要的问题是其在多个任务上缺乏通用性，以及需要大量的训练数据。主要原因在于在制定策略时无法有效地将先前信息整合到感知-行动循环中。大型语言模型（LLMs）作为将跨领域知识整合到AI代理中的基本方法出现，但缺乏对特定决策问题的关键学习和适应能力。本文提出了一个通用框架模型，用于将结构化推理整合到AI代理的策略中。我们的方法受到人类大脑中的模块化发现的启发。该框架利用构建内在和外在函数来添加对推理结构的先前理解。它还提供了学习每个模块或函数内部模型的适应能力，与认知过程的模块化结构一致。我们深入描述了该框架，并将其与其他AI流程和现有框架进行了比较。本文探讨了实际应用，涵盖了展示我们方法有效性的实验。我们的结果表明，当组织推理和先前知识嵌入时，AI代理的表现和适应能力要好得多。这为更具弹性和通用性的AI代理系统打开了大门。

English

A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. Large language models (LLMs) emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.

盘古智能体：具有结构化推理能力的可微调通用智能体

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

摘要

Support