ChatPaper.aiChatPaper

推理模型催生思想社会

Reasoning Models Generate Societies of Thought

January 15, 2026
作者: Junsol Kim, Shiyang Lai, Nino Scherrer, Blaise Agüera y Arcas, James Evans
cs.AI

摘要

大型语言模型已在多领域展现出卓越能力,但复杂推理背后的运作机制仍不明确。近期研究发现,推理模型在复杂认知任务上表现优于同等规模的指令微调模型,传统归因于通过更长思维链实现的扩展计算。本文提出,推理能力的提升不仅源于计算扩展,更关键的是通过模拟多智能体交互——即"思维社会"——实现由不同个性特征与领域专长驱动的内部认知视角多元化与辩论。通过对推理轨迹的定量分析和机制可解释性研究,我们发现DeepSeek-R1与QwQ-32B等推理模型展现出远高于指令微调模型的视角多样性,在推理过程中会激活更广泛的异质性人格特征与专业知识特征间的冲突。这种多智能体结构具体表现为对话行为(包括问答、视角转换和矛盾观点调和)以及体现激烈交锋对话的社会情感角色,共同构成推理任务中的准确率优势。受控强化学习实验表明,当仅以推理准确率为奖励时,基础模型会增强对话行为;而采用对话支架进行微调的模型比基础模型能更快提升推理能力。这些发现表明,思维的社会化组织能有效拓展解空间的探索范围。我们认为推理模型建立了人类群体集体智能的计算平行体——当系统化构建多样性时,能催生更优越的问题解决能力,这为通过智能体组织利用群体智慧开辟了新路径。
English
Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable instruction-tuned models on complex cognitive tasks, attributed to extended computation through longer chains of thought. Here we show that enhanced reasoning emerges not from extended computation alone, but from simulating multi-agent-like interactions -- a society of thought -- which enables diversification and debate among internal cognitive perspectives characterized by distinct personality traits and domain expertise. Through quantitative analysis and mechanistic interpretability methods applied to reasoning traces, we find that reasoning models like DeepSeek-R1 and QwQ-32B exhibit much greater perspective diversity than instruction-tuned models, activating broader conflict between heterogeneous personality- and expertise-related features during reasoning. This multi-agent structure manifests in conversational behaviors, including question-answering, perspective shifts, and the reconciliation of conflicting views, and in socio-emotional roles that characterize sharp back-and-forth conversations, together accounting for the accuracy advantage in reasoning tasks. Controlled reinforcement learning experiments reveal that base models increase conversational behaviors when rewarded solely for reasoning accuracy, and fine-tuning models with conversational scaffolding accelerates reasoning improvement over base models. These findings indicate that the social organization of thought enables effective exploration of solution spaces. We suggest that reasoning models establish a computational parallel to collective intelligence in human groups, where diversity enables superior problem-solving when systematically structured, which suggests new opportunities for agent organization to harness the wisdom of crowds.
PDF52January 20, 2026