大型语言模型中的能量耗散现象

摘要

我們將大型語言模型（LLM）最終的softmax分類器重新解讀為能量基模型（EBM），在推理過程中把序列到序列的機率鏈分解為多個相互作用的EBM。這種基於原理的方法使我們能夠追蹤解碼過程中的"能量溢出"，並通過實驗證明其與事實錯誤、偏見及故障存在相關性。與Orgad等人（2025）的研究類似，我們的方法能定位具體答案標記並檢測幻覺現象。但關鍵區別在於，我們無需訓練探測分類器或進行激活值消融即可實現此目標。我們引入了兩種完全無需訓練的指標，直接從輸出邏輯值推導：捕獲連續生成步驟間理論上應匹配的能量值差異的"溢出能量"，以及可在單一步驟測量的"邊際化能量"。在九個基準測試（涵蓋LLaMA、Mistral、Gemma等頂尖LLM）和代數運算合成數據（Qwen3）上的評估表明，我們的方法在幻覺檢測和跨任務泛化方面表現出強健的競爭力。值得注意的是，這些結果同時適用於預訓練模型和指令微調模型，且不產生任何訓練開銷。代碼發佈於：github.com/OmnAI-Lab/spilled-energy

English

We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "energy spills" during decoding, which we empirically show correlate with factual errors, biases, and failures. Similar to Orgad et al. (2025), our method localizes the exact answer token and subsequently tests for hallucinations. Crucially, however, we achieve this without requiring trained probe classifiers or activation ablations. Instead, we introduce two completely training-free metrics derived directly from output logits: spilled energy, which captures the discrepancy between energy values across consecutive generation steps that should theoretically match, and marginalized energy, which is measurable at a single step. Evaluated on nine benchmarks across state-of-the-art LLMs (including LLaMA, Mistral, and Gemma) and on synthetic algebraic operations (Qwen3), our approach demonstrates robust, competitive hallucination detection and cross-task generalization. Notably, these results hold for both pretrained and instruction-tuned variants without introducing any training overhead. Code available at: github.com/OmnAI-Lab/spilled-energy

大型语言模型中的能量耗散现象

Spilled Energy in Large Language Models

摘要

Support