LLM作為雜訊通道：從香農觀點看模型容量與尺度定律

摘要

現有的大型語言模型（LLM）縮放定律，主要是單調冪律，無法解釋新出現的非單調現象，例如災難性過度訓練與量化導致的退化——在增加運算量的情況下，模型性能反而惡化。我們提出香農縮放定律（Shannon Scaling Law），這是一個統一的理論框架，基於香農-哈特利定理（Shannon-Hartley theorem），將 LLM 訓練建模為雜訊通道中的資訊傳輸。透過將模型參數映射為通道頻寬、訓練 token 映射為訊號功率，我們的公式明確捕捉了學習訊號與內在雜訊之間的交互作用。此視角揭示了 LLM 的香農容量：若在擴展模型規模或資料時未能維持足夠的信噪比（SNR），將不可避免地放大雜訊，導致性能從單調提升轉變為 U 形退化。我們透過在 Pythia 和 OLMo2 上進行干擾實驗（包括高斯雜訊、量化，以及數學、問答和程式碼任務上的監督式微調）驗證了此理論。香農縮放定律持續優於經典縮放定律及近期提出的干擾感知定律，獲得了優異的 R² 分數，並準確捕捉了先前方法未能發現的損失盆地。此外，該定律具備外推能力：在 ≤6.9B 參數的 Pythia 模型上以 ≤180B token 擬合後，可預測未見過的 12B 模型至多 307B token 的表現，匯總 R² 達 0.847，而單調基線法則完全失效。

English

Existing scaling laws for Large Language Models (LLMs), predominantly monotonic power laws, fail to explain emerging non-monotonic phenomena such as catastrophic overtraining and quantization-induced degradation, where performance deteriorates despite increased compute. We propose the Shannon Scaling Law, a unified theoretical framework that models LLM training as information transmission over a noisy channel, grounded in the Shannon-Hartley theorem. By mapping model parameters to channel bandwidth and training tokens to signal power, our formulation explicitly captures the interaction between learning signal and intrinsic noise. This perspective reveals a fundamental Shannon capacity for LLMs: scaling model size or data without preserving a sufficient signal-to-noise ratio (SNR) inevitably amplifies noise, inducing a transition from monotonic improvement to U-shaped performance degradation. We validate our theory through experiments on Pythia and OLMo2 under perturbations, including Gaussian noise, quantization and supervised fine-tuning on math, QA and code tasks. The Shannon Scaling Law consistently outperforms classical scaling laws and recent perturbation-aware laws, achieving strong R^2 scores and accurately capturing loss basins missed by prior approaches. It also extrapolates: fitted on leq6.9B Pythia models with leq180B tokens, it predicts the unseen 12B model up to 307B tokens at pooled R^2{=}0.847, while monotonic baselines collapse.