ChatPaper.aiChatPaper

随机混沌:为何确定性推断扼杀生机,而分布变异性才是人工认知的心跳

Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition

January 12, 2026
作者: Tanmay Joshi, Shourya Aggarwal, Anusa Saha, Aadi Pandey, Shreyash Dhoot, Vighnesh Rai, Raxit Goswami, Aman Chadha, Vinija Jain, Amitava Das
cs.AI

摘要

确定性推理是经典软件中令人安心的理想范式:相同程序在相同输入下应始终产生相同输出。随着大语言模型进入实际部署阶段,这种理想被全盘照搬到推理架构中。思维机器实验室的最新研究详细分析了LLM推理中的非确定性,展示了批次不变核函数与确定性注意力机制如何确保比特级完全一致的输出,将确定性推理定位为可复现性和企业级可靠性的前提。 本文则持相反立场。我们认为,对于LLMs而言,确定性推理无异于扼杀生机。它扼杀了建模不确定性的能力,压制了涌现能力,使推理过程坍缩为单一脆弱路径,并通过隐藏尾部风险削弱安全对齐效果。LLMs实现的是基于输出的条件分布,而非固定函数。将这些分布坍缩为单一标准完成结果看似令人安心,实则系统性掩盖了人工认知的核心特性。我们主张采用"随机混沌"范式,将分布变异性视为可测量和可控的信号。 实证研究表明,确定性推理会产生系统性误判。单样本确定性评估会同时低估模型能力和脆弱性,掩盖其在语义改写和噪声下的失败概率。与涌现能力相关的类相变现象在贪心解码下消失殆尽。强制采用确定性主干网络会削弱多路径推理效果,降低准确率和诊断洞察力。最后,确定性评估通过隐藏仅在多样本评估中出现的罕见危险行为,导致安全风险被系统性低估。
English
Deterministic inference is a comforting ideal in classical software: the same program on the same input should always produce the same output. As large language models move into real-world deployment, this ideal has been imported wholesale into inference stacks. Recent work from the Thinking Machines Lab has presented a detailed analysis of nondeterminism in LLM inference, showing how batch-invariant kernels and deterministic attention can enforce bitwise-identical outputs, positioning deterministic inference as a prerequisite for reproducibility and enterprise reliability. In this paper, we take the opposite stance. We argue that, for LLMs, deterministic inference kills. It kills the ability to model uncertainty, suppresses emergent abilities, collapses reasoning into a single brittle path, and weakens safety alignment by hiding tail risks. LLMs implement conditional distributions over outputs, not fixed functions. Collapsing these distributions to a single canonical completion may appear reassuring, but it systematically conceals properties central to artificial cognition. We instead advocate Stochastic CHAOS, treating distributional variability as a signal to be measured and controlled. Empirically, we show that deterministic inference is systematically misleading. Single-sample deterministic evaluation underestimates both capability and fragility, masking failure probability under paraphrases and noise. Phase-like transitions associated with emergent abilities disappear under greedy decoding. Multi-path reasoning degrades when forced onto deterministic backbones, reducing accuracy and diagnostic insight. Finally, deterministic evaluation underestimates safety risk by hiding rare but dangerous behaviors that appear only under multi-sample evaluation.
PDF22January 31, 2026