超越符號的思考:從腦啟發智能到人工通用智能的認知基礎及其社會影響
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
July 1, 2025
作者: Rizwan Qureshi, Ranjan Sapkota, Abbas Shah, Amgad Muneer, Anas Zafar, Ashmal Vayani, Maged Shoman, Abdelrahman B. M. Eldaly, Kai Zhang, Ferhat Sadak, Shaina Raza, Xinqi Fan, Ravid Shwartz-Ziv, Hong Yan, Vinjia Jain, Aman Chadha, Manoj Karkee, Jia Wu, Philip Torr, Seyedali Mirjalili
cs.AI
摘要
機器是否能夠真正像人類一樣思考、推理並在各個領域中行動?這個歷久彌新的問題持續塑造著對人工通用智能(AGI)的追求。儘管如GPT-4.5、DeepSeek、Claude 3.5 Sonnet、Phi-4和Grok 3等模型展現出多模態流暢性和部分推理能力,這些系統仍然受到其依賴於詞元級預測和缺乏紮根代理的根本限制。本文提供了一個跨學科的AGI發展綜述,涵蓋人工智慧、認知神經科學、心理學、生成模型和基於代理的系統。我們分析了通用智能的架構和認知基礎,強調了模組化推理、持久記憶和多代理協調的作用。特別地,我們強調了結合檢索、規劃和動態工具使用的代理式RAG框架的興起,這些框架促進了更具適應性的行為。我們討論了泛化策略,包括信息壓縮、測試時適應和無訓練方法,作為實現靈活、領域無關智能的關鍵途徑。視覺-語言模型(VLMs)被重新審視,不僅作為感知模組,更是作為體現理解和協作任務完成的進化接口。我們還主張,真正的智能不僅來自規模,更來自記憶與推理的整合:一個由模組化、互動性和自我改進組件組成的協調系統,其中壓縮促成了適應性行為。借鑒神經符號系統、強化學習和認知支架的進展,我們探討了最近的架構如何開始彌合統計學習與目標導向認知之間的差距。最後,我們指出了通往AGI道路上的關鍵科學、技術和倫理挑戰。
English
Can machines truly think, reason and act in domains like humans? This
enduring question continues to shape the pursuit of Artificial General
Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5,
DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal
fluency and partial reasoning, these systems remain fundamentally limited by
their reliance on token-level prediction and lack of grounded agency. This
paper offers a cross-disciplinary synthesis of AGI development, spanning
artificial intelligence, cognitive neuroscience, psychology, generative models,
and agent-based systems. We analyze the architectural and cognitive foundations
of general intelligence, highlighting the role of modular reasoning, persistent
memory, and multi-agent coordination. In particular, we emphasize the rise of
Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use
to enable more adaptive behavior. We discuss generalization strategies,
including information compression, test-time adaptation, and training-free
methods, as critical pathways toward flexible, domain-agnostic intelligence.
Vision-Language Models (VLMs) are reexamined not just as perception modules but
as evolving interfaces for embodied understanding and collaborative task
completion. We also argue that true intelligence arises not from scale alone
but from the integration of memory and reasoning: an orchestration of modular,
interactive, and self-improving components where compression enables adaptive
behavior. Drawing on advances in neurosymbolic systems, reinforcement learning,
and cognitive scaffolding, we explore how recent architectures begin to bridge
the gap between statistical learning and goal-directed cognition. Finally, we
identify key scientific, technical, and ethical challenges on the path to AGI.