대규모 언어 모델에서의 에너지 손실

초록

최종 Large Language Model(LLM) 소프트맥스 분류기를 Energy-Based Model(EBM)으로 재해석하며, 시퀀스-투-시퀀스 확률 체인을 추론 시 상호작용하는 다중 EBM으로 분해합니다. 이 원리 기반 접근법을 통해 디코딩 과정에서 발생하는 "에너지 누수(energy spills)"를 추적할 수 있으며, 우리의 실험 결과 이 현상이 사실 오류, 편향 및 실패와 상관관계가 있음을 보여줍니다. Orgad et al.(2025)과 유사하게, 우리의 방법은 정확한 답변 토큰을 특정한 후 환각 현상을 검증합니다. 그러나 중요한 차이점은, 훈련된 프로브 분류기나 활성화 차단(activation ablation) 없이도 이를 달성한다는 점입니다. 대신, 출력 로짓에서 직접 도출된 완전히 훈련이 필요 없는 두 가지 지표를 제안합니다: 이론적으로 일치해야 하는 연속적인 생성 단계 간 에너지 값의 불일치를 포착하는 spilled energy와 단일 단계에서 측정 가능한 marginalized energy입니다. 최첨단 LLM(LLaMA, Mistral, Gemma 포함)과 합성 대수 연산(Qwen3)에 대한 9개 벤치마크에서 평가 결과, 우리의 접근법은 강력하고 경쟁력 있는 환각 감지 및 크로스태스크 일반화 성능을 입증했습니다. 특히 이러한 결과는 어떠한 훈련 오버헤드도 도입하지 않으면서 사전 훈련된 모델과 지시 튜닝된 변형 모두에서 동일하게 나타납니다. 코드는 github.com/OmnAI-Lab/spilled-energy에서 확인할 수 있습니다.

English

We reinterpret the final Large Language Model (LLM) softmax classifier as an Energy-Based Model (EBM), decomposing the sequence-to-sequence probability chain into multiple interacting EBMs at inference. This principled approach allows us to track "energy spills" during decoding, which we empirically show correlate with factual errors, biases, and failures. Similar to Orgad et al. (2025), our method localizes the exact answer token and subsequently tests for hallucinations. Crucially, however, we achieve this without requiring trained probe classifiers or activation ablations. Instead, we introduce two completely training-free metrics derived directly from output logits: spilled energy, which captures the discrepancy between energy values across consecutive generation steps that should theoretically match, and marginalized energy, which is measurable at a single step. Evaluated on nine benchmarks across state-of-the-art LLMs (including LLaMA, Mistral, and Gemma) and on synthetic algebraic operations (Qwen3), our approach demonstrates robust, competitive hallucination detection and cross-task generalization. Notably, these results hold for both pretrained and instruction-tuned variants without introducing any training overhead. Code available at: github.com/OmnAI-Lab/spilled-energy

대규모 언어 모델에서의 에너지 손실

Spilled Energy in Large Language Models

초록

Support