随机性混沌:为何确定性推断扼杀生机,而分布变异性才是人工认知的心跳
Stochastic CHAOS: Why Deterministic Inference Kills, and Distributional Variability Is the Heartbeat of Artifical Cognition
January 12, 2026
作者: Tanmay Joshi, Shourya Aggarwal, Anusa Saha, Aadi Pandey, Shreyash Dhoot, Vighnesh Rai, Raxit Goswami, Aman Chadha, Vinija Jain, Amitava Das
cs.AI
摘要
确定性推理是经典软件中令人安心的理想范式:相同程序在相同输入下应始终产生相同输出。随着大语言模型进入实际部署,这一理想被全盘照搬到推理架构中。思维机器实验室的最新研究详细分析了LLM推理中的非确定性,展示了批次不变内核与确定性注意力机制如何确保比特级完全一致的输出,将确定性推理定位为可复现性和企业级可靠性的前提。
本文则持相反立场。我们认为对LLM而言,确定性推理实为桎梏:它扼杀不确定性建模能力,压制涌现能力,将推理压缩至单一脆弱路径,并通过隐藏尾部风险削弱安全对齐效果。LLM本质是实现输出的条件分布而非固定函数,将这些分布坍缩为单一标准完成看似可靠,却系统性掩盖了人工认知的核心特性。我们主张采用"随机混沌"范式,将分布变异性视为可测量、可控制的信号。
实证研究表明,确定性推理具有系统性误导。单样本确定性评估会同时低估模型能力与脆弱性,掩盖语义改写和噪声下的失败概率。与涌现能力相关的相变现象在贪心解码下消失;强制采用确定性主干会弱化多路径推理,降低准确率与诊断洞察力;确定性评估还通过隐藏仅在多样本评估中出现的罕见危险行为,低估安全风险。
English
Deterministic inference is a comforting ideal in classical software: the same program on the same input should always produce the same output. As large language models move into real-world deployment, this ideal has been imported wholesale into inference stacks. Recent work from the Thinking Machines Lab has presented a detailed analysis of nondeterminism in LLM inference, showing how batch-invariant kernels and deterministic attention can enforce bitwise-identical outputs, positioning deterministic inference as a prerequisite for reproducibility and enterprise reliability.
In this paper, we take the opposite stance. We argue that, for LLMs, deterministic inference kills. It kills the ability to model uncertainty, suppresses emergent abilities, collapses reasoning into a single brittle path, and weakens safety alignment by hiding tail risks. LLMs implement conditional distributions over outputs, not fixed functions. Collapsing these distributions to a single canonical completion may appear reassuring, but it systematically conceals properties central to artificial cognition. We instead advocate Stochastic CHAOS, treating distributional variability as a signal to be measured and controlled.
Empirically, we show that deterministic inference is systematically misleading. Single-sample deterministic evaluation underestimates both capability and fragility, masking failure probability under paraphrases and noise. Phase-like transitions associated with emergent abilities disappear under greedy decoding. Multi-path reasoning degrades when forced onto deterministic backbones, reducing accuracy and diagnostic insight. Finally, deterministic evaluation underestimates safety risk by hiding rare but dangerous behaviors that appear only under multi-sample evaluation.