生成式AI第二幕：測試時擴展推動認知工程

摘要

第一代大型語言模型——可稱之為生成式人工智慧的「第一幕」（2020-2023年）——通過大規模的參數與數據擴展取得了顯著成就，但在知識更新延遲、淺層推理及受限的認知過程方面仍存在根本性限制。在此期間，提示工程（prompt engineering）成為我們與AI互動的主要介面，實現了基於自然語言的對話層級交流。如今，我們正見證「第二幕」（2024年至今）的興起，模型正從（潛在空間中的）知識檢索系統轉變為通過測試時擴展技術（test-time scaling）構建思維的引擎。這一新範式通過語言化的思維與AI建立了心智層面的連接。本文中，我們闡明了認知工程（cognition engineering）的概念基礎，並解釋了為何此刻是其發展的關鍵時期。我們通過全面的教程與優化的實現方案，系統性地拆解這些先進方法，使認知工程得以普及，讓每位實踐者都能參與到AI的第二幕中。我們在GitHub倉庫中定期更新關於測試時擴展的論文集合：https://github.com/GAIR-NLP/cognition-engineering。

English

The first generation of Large Language Models - what might be called "Act I" of generative AI (2020-2023) - achieved remarkable success through massive parameter and data scaling, yet exhibited fundamental limitations in knowledge latency, shallow reasoning, and constrained cognitive processes. During this era, prompt engineering emerged as our primary interface with AI, enabling dialogue-level communication through natural language. We now witness the emergence of "Act II" (2024-present), where models are transitioning from knowledge-retrieval systems (in latent space) to thought-construction engines through test-time scaling techniques. This new paradigm establishes a mind-level connection with AI through language-based thoughts. In this paper, we clarify the conceptual foundations of cognition engineering and explain why this moment is critical for its development. We systematically break down these advanced approaches through comprehensive tutorials and optimized implementations, democratizing access to cognition engineering and enabling every practitioner to participate in AI's second act. We provide a regularly updated collection of papers on test-time scaling in the GitHub Repository: https://github.com/GAIR-NLP/cognition-engineering