從聊天機器人到數位同事:邁向持久自主人工智慧的典範轉移
From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI
June 12, 2026
作者: Yongheng Zhang, Ziang Liu, Jiaxuan Zhu, Shuai Wang, Xiangqi Chen, Haojing Huang, Jiayi Kuang, Siyu Chen, Ao Shen, Hao Wu, Qiufeng Wang, Qian-Wen Zhang, Junnan Dong, Wenhao Jiang, Ying Shen, Hai-Tao Zheng, Yinghui Li, Di Yin, Xing Sun, Philip S. Yu
cs.AI
摘要
大型語言模型(LLMs)正經歷從對話生成器到具備推理、行動、記憶與自我改進能力的整合型AI系統的根本性轉變。我們將此過渡概念化為從聊天機器人到數位同事的轉變:從對話式回應轉向持續性工作。我們沿著兩個緊密耦合的維度來組織此轉變。首先,在認知核心層面,LLMs正從聊天機器人時代由下一個詞元預測驅動的「快速思考」系統,邁向利用推理時間計算、思維鏈推理、反思、過程監督及強化學習的思考型LLM,以支援更審慎且可靠的認知能力。其次,在工具擴增的任務執行層面,LLMs正從以臨時方式調用外部資源的工具調用代理,進展至配備持久工作區、技能、驗證迴圈與治理機制的OpenClaw風格工作站系統(OpenClaw)。「工作區+技能」典範透過狀態持久性、可重複使用程序、任務閉合性與經驗重複使用,將偶發性工具使用轉變為同事式協作。我們檢視數據建構從指令-回應對到狀態-動作-觀察軌跡的轉變,以及評估從靜態基準到沙盒化、可稽核、自我演進的AI生態系統的演進。
English
Large Language Models (LLMs) are undergoing a fundamental transformation from conversational generators into integrated AI systems capable of reasoning, action, memory, and self-improvement. We conceptualize this transition as a shift from Chatbot to Digital Colleague: from conversational answers to persistent work. We organize this transition along two tightly coupled dimensions. First, at the cognitive core level, LLMs are advancing from Chatbot-era "fast thinking" systems driven by next-token prediction toward Thinking LLMs that leverage inference-time computation, Chain-of-Thought reasoning, reflection, process supervision, and reinforcement learning to support more deliberate and reliable cognition. Second, at the tool-augmented task execution level, LLMs are progressing from tool-calling Agents that invoke external resources in an ad hoc manner toward OpenClaw-style workstation systems (OpenClaw) equipped with persistent Workspaces, skills, verification loops, and governance. The "Workspace + Skill" paradigm makes episodic tool use colleague-like via state persistence, reusable procedures, task closure, and experience reuse. We examine data construction shifts from instruction-response pairs to State-Action-Observation trajectories and evaluation from static benchmarks to sandboxed, auditable, self-evolving AI ecosystems.