ChatPaper.aiChatPaper

超越符号思维:从类脑智能到人工通用智能的认知基础及其社会影响

Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

July 1, 2025
作者: Rizwan Qureshi, Ranjan Sapkota, Abbas Shah, Amgad Muneer, Anas Zafar, Ashmal Vayani, Maged Shoman, Abdelrahman B. M. Eldaly, Kai Zhang, Ferhat Sadak, Shaina Raza, Xinqi Fan, Ravid Shwartz-Ziv, Hong Yan, Vinjia Jain, Aman Chadha, Manoj Karkee, Jia Wu, Philip Torr, Seyedali Mirjalili
cs.AI

摘要

机器能否真正像人类一样思考、推理并在各个领域中行动?这一持久的问题持续塑造着对人工通用智能(AGI)的追求。尽管诸如GPT-4.5、DeepSeek、Claude 3.5 Sonnet、Phi-4和Grok 3等模型展现出多模态流畅性和部分推理能力,但这些系统本质上仍受限于其基于令牌级别的预测和对实体代理的缺乏。本文提供了一个跨学科的AGI发展综述,涵盖人工智能、认知神经科学、心理学、生成模型和基于代理的系统。我们分析了通用智能的架构和认知基础,强调了模块化推理、持久记忆和多代理协调的作用。特别是,我们强调了结合检索、规划和动态工具使用的Agentic RAG框架的兴起,以实现更具适应性的行为。我们讨论了泛化策略,包括信息压缩、测试时适应和无训练方法,作为通向灵活、领域无关智能的关键路径。视觉-语言模型(VLMs)被重新审视,不仅作为感知模块,更是作为具身理解和协作任务完成的演进接口。我们还提出,真正的智能不仅源于规模,更源于记忆与推理的整合:一个由模块化、互动和自我改进组件组成的协调系统,其中压缩促成了适应性行为。借鉴神经符号系统、强化学习和认知支架的进展,我们探讨了近期架构如何开始弥合统计学习与目标导向认知之间的差距。最后,我们指出了通往AGI道路上的关键科学、技术和伦理挑战。
English
Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI.
PDF104July 2, 2025