大型语言模型中的能量损耗

摘要

我们将最终的大型语言模型（LLM）softmax分类器重新阐释为基于能量的模型（EBM），在推理过程中将序列到序列的概率链分解为多个相互作用的EBM。这种原理性方法使我们能够追踪解码过程中的"能量溢出"现象，并通过实验证明其与事实错误、偏见及故障存在相关性。与Orgad等人（2025）的研究类似，我们的方法能定位具体答案标记并检测幻觉现象。但关键创新在于，我们无需训练探针分类器或进行激活值消融即可实现该目标。我们引入了两种完全无需训练的指标：能量溢出值（捕捉连续生成步骤间本应匹配的能量值差异）和边缘化能量值（可在单步生成中测量），二者均直接源自输出逻辑值。在涵盖前沿LLM（包括LLaMA、Mistral和Gemma）的九个基准测试及Qwen3的代数运算合成任务上的实验表明，该方法在幻觉检测和跨任务泛化方面表现出稳健且具有竞争力的性能。值得注意的是，这些结果同时适用于预训练模型和指令微调变体，且不产生任何训练开销。代码已开源：github.com/OmnAI-Lab/spilled-energy

English

We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "energy spills" during decoding, which we empirically show correlate with factual errors, biases, and failures. Similar to Orgad et al. (2025), our method localizes the exact answer token and subsequently tests for hallucinations. Crucially, however, we achieve this without requiring trained probe classifiers or activation ablations. Instead, we introduce two completely training-free metrics derived directly from output logits: spilled energy, which captures the discrepancy between energy values across consecutive generation steps that should theoretically match, and marginalized energy, which is measurable at a single step. Evaluated on nine benchmarks across state-of-the-art LLMs (including LLaMA, Mistral, and Gemma) and on synthetic algebraic operations (Qwen3), our approach demonstrates robust, competitive hallucination detection and cross-task generalization. Notably, these results hold for both pretrained and instruction-tuned variants without introducing any training overhead. Code available at: github.com/OmnAI-Lab/spilled-energy

大型语言模型中的能量损耗

Spilled Energy in Large Language Models

摘要

Support