ChatPaper.aiChatPaper

推理模型催生思想社会

Reasoning Models Generate Societies of Thought

January 15, 2026
作者: Junsol Kim, Shiyang Lai, Nino Scherrer, Blaise Agüera y Arcas, James Evans
cs.AI

摘要

大型语言模型已在多领域展现出卓越能力,但复杂推理的底层机制仍不明确。近期推理模型在复杂认知任务上表现优于同等规模的指令微调模型,这通常归因于通过更长思维链实现的扩展计算。本文发现,增强的推理能力不仅源于扩展计算,更来自模拟多智能体式互动——即"思维社会"——这种机制通过具有鲜明个性特征与领域专长的内部认知视角之间的多样化辩论来实现。通过对推理轨迹的定量分析和机制可解释性研究,我们发现DeepSeek-R1与QwQ-32B等推理模型比指令微调模型展现出更显著的视角多样性,在推理过程中会激活更多涉及异质性个性与专业特征的内部冲突。这种多智能体结构既体现于问答、视角转换、矛盾观点调和等对话行为,也呈现为激烈交锋对话中的社会情感角色,共同构成了推理任务中的准确率优势。受控强化学习实验表明,当仅以推理准确率作为奖励时,基础模型会增强对话行为;而采用对话支架对模型进行微调,能比基础模型更快提升推理能力。这些发现表明思维的社会化组织能有效拓展解空间的探索范围。我们认为推理模型建立了人类群体集体智能的计算平行体——当多样性被系统化组织时,能催生更优异的问题解决能力,这为通过智能体组织利用群体智慧揭示了新的可能性。
English
Large language models have achieved remarkable capabilities across domains, yet mechanisms underlying sophisticated reasoning remain elusive. Recent reasoning models outperform comparable instruction-tuned models on complex cognitive tasks, attributed to extended computation through longer chains of thought. Here we show that enhanced reasoning emerges not from extended computation alone, but from simulating multi-agent-like interactions -- a society of thought -- which enables diversification and debate among internal cognitive perspectives characterized by distinct personality traits and domain expertise. Through quantitative analysis and mechanistic interpretability methods applied to reasoning traces, we find that reasoning models like DeepSeek-R1 and QwQ-32B exhibit much greater perspective diversity than instruction-tuned models, activating broader conflict between heterogeneous personality- and expertise-related features during reasoning. This multi-agent structure manifests in conversational behaviors, including question-answering, perspective shifts, and the reconciliation of conflicting views, and in socio-emotional roles that characterize sharp back-and-forth conversations, together accounting for the accuracy advantage in reasoning tasks. Controlled reinforcement learning experiments reveal that base models increase conversational behaviors when rewarded solely for reasoning accuracy, and fine-tuning models with conversational scaffolding accelerates reasoning improvement over base models. These findings indicate that the social organization of thought enables effective exploration of solution spaces. We suggest that reasoning models establish a computational parallel to collective intelligence in human groups, where diversity enables superior problem-solving when systematically structured, which suggests new opportunities for agent organization to harness the wisdom of crowds.
PDF52January 20, 2026