판구-에이전트: 구조화된 추론이 가능한 미세 조정 가능한 범용 에이전트

초록

인공지능(AI) 에이전트를 생성하는 핵심 방법 중 하나는 강화학습(Reinforcement Learning, RL)입니다. 그러나 지각을 행동에 직접 매핑하는 독립적인 RL 정책을 구축하는 것은 여러 과제에 걸친 일반성 부족과 대량의 훈련 데이터 필요성 등 심각한 문제에 직면합니다. 주요 원인은 정책을 설계할 때 지각-행동 주기에 사전 정보를 효과적으로 통합할 수 없다는 점입니다. 대규모 언어 모델(Large Language Models, LLMs)은 AI 에이전트에 도메인 간 지식을 통합하는 근본적인 방법으로 등장했지만, 특정 의사결정 문제에 대한 학습과 적응 능력이 부족합니다. 본 논문은 AI 에이전트의 정책에 구조화된 추론을 통합하고 학습하기 위한 일반적인 프레임워크 모델을 제시합니다. 우리의 방법론은 인간 뇌에서 발견되는 모듈성에서 영감을 받았습니다. 이 프레임워크는 내재적 및 외재적 함수 구성을 활용하여 추론 구조에 대한 기존 이해를 추가합니다. 또한 인지 과정의 모듈 구조와 일치하도록 각 모듈 또는 함수 내부에서 모델을 학습할 수 있는 적응 능력을 제공합니다. 우리는 이 프레임워크를 심층적으로 설명하고 다른 AI 파이프라인 및 기존 프레임워크와 비교합니다. 본 논문은 우리의 방법의 효과를 보여주는 실험을 포함한 실제 응용 사례를 탐구합니다. 연구 결과는 조직화된 추론과 사전 지식이 내장된 경우 AI 에이전트의 성능과 적응력이 훨씬 더 우수함을 나타냅니다. 이는 더 견고하고 일반적인 AI 에이전트 시스템으로의 길을 열어줍니다.

English

A key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL). However, constructing a standalone RL policy that maps perception to action directly encounters severe problems, chief among them being its lack of generality across multiple tasks and the need for a large amount of training data. The leading cause is that it cannot effectively integrate prior information into the perception-action cycle when devising the policy. Large language models (LLMs) emerged as a fundamental way to incorporate cross-domain knowledge into AI agents but lack crucial learning and adaptation toward specific decision problems. This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies. Our methodology is motivated by the modularity found in the human brain. The framework utilises the construction of intrinsic and extrinsic functions to add previous understandings of reasoning structures. It also provides the adaptive ability to learn models inside every module or function, consistent with the modular structure of cognitive processes. We describe the framework in-depth and compare it with other AI pipelines and existing frameworks. The paper explores practical applications, covering experiments that show the effectiveness of our method. Our results indicate that AI agents perform and adapt far better when organised reasoning and prior knowledge are embedded. This opens the door to more resilient and general AI agent systems.

판구-에이전트: 구조화된 추론이 가능한 미세 조정 가능한 범용 에이전트

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

초록

Support