大语言模型的自主推理能力

摘要

推理是支撑推断、问题解决与决策制定的基本认知过程。尽管大语言模型在封闭环境中展现出强大的推理能力，但在开放动态环境中仍面临挑战。智能体推理通过将大语言模型重构为能够通过持续交互进行规划、行动与学习的自主智能体，标志着范式的转变。本综述从三个互补维度系统梳理智能体推理研究：首先，通过三层架构刻画环境动态性——基础智能体推理建立智能体在稳定环境中的核心单机能力（包括规划、工具使用与搜索）；自我进化智能体推理研究智能体如何通过反馈、记忆与适应机制优化这些能力；集体多智能体推理将智能延伸至涉及协作、知识共享与共同目标的协同场景。跨越多层架构，我们区分了通过结构化编排扩展测试时交互的情境推理，与通过强化学习和监督微调优化行为的训练后推理。进而系统评述了科学、机器人、医疗、自主研究与数学等现实应用场景中的代表性智能体推理框架。本综述将智能体推理方法整合为连接思维与行动的统一路线图，并指出个性化、长周期交互、世界建模、可扩展多智能体训练及实际部署治理等开放挑战与未来方向。

English

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world settings, they struggle in open-ended and dynamic environments. Agentic reasoning marks a paradigm shift by reframing LLMs as autonomous agents that plan, act, and learn through continual interaction. In this survey, we organize agentic reasoning along three complementary dimensions. First, we characterize environmental dynamics through three layers: foundational agentic reasoning, which establishes core single-agent capabilities including planning, tool use, and search in stable environments; self-evolving agentic reasoning, which studies how agents refine these capabilities through feedback, memory, and adaptation; and collective multi-agent reasoning, which extends intelligence to collaborative settings involving coordination, knowledge sharing, and shared goals. Across these layers, we distinguish in-context reasoning, which scales test-time interaction through structured orchestration, from post-training reasoning, which optimizes behaviors via reinforcement learning and supervised fine-tuning. We further review representative agentic reasoning frameworks across real-world applications and benchmarks, including science, robotics, healthcare, autonomous research, and mathematics. This survey synthesizes agentic reasoning methods into a unified roadmap bridging thought and action, and outlines open challenges and future directions, including personalization, long-horizon interaction, world modeling, scalable multi-agent training, and governance for real-world deployment.