對齊如何縮小生成視野

摘要

儘管對齊的大型語言模型（LLMs）展現了令人印象深刻的能力，但其生成的輸出往往缺乏多樣性。是什麼驅動了這種生成過程中的穩定性？我們通過模型輸出分佈中的概率集中現象來探討這一現象。為了量化這種集中程度，我們引入了分支因子（Branching Factor, BF）——一種在生成過程中衡量有效可行後續步驟數量的不變量指標。我們的實證分析揭示了兩個關鍵發現：（1）隨著生成的進行，BF通常會降低，這表明LLMs在生成過程中變得更加可預測。（2）對齊調優從一開始就顯著銳化了模型的輸出分佈，使BF相較於基礎模型降低了近一個數量級（例如，從12降至1.2）。這一顯著的降低有助於解釋為何對齊模型對解碼策略的敏感性較低。基於這一洞察，我們發現這種穩定性對複雜推理具有意想不到的影響。例如，對齊的思維鏈（Chain-of-Thought, CoT）模型（如DeepSeek蒸餾模型）就利用了這一效應；通過生成更長的推理鏈，它們將生成推遲到後期、更具確定性（BF更低）的階段，從而產生更穩定的輸出。我們假設，對齊調優並未從根本上改變模型的行為，而是引導其朝向風格化標記（如“當然”），這些標記解鎖了基礎模型中已有的低熵軌跡。這一觀點得到了微調實驗的支持，實驗顯示，用此類標記提示基礎模型同樣可以降低BF。綜合來看，我們的研究確立了BF作為理解和控制LLM輸出的強大診斷工具——闡明瞭對齊如何減少變異性、CoT如何促進穩定生成，以及如何引導基礎模型遠離多樣性。

English

Despite their impressive capabilities, aligned large language models (LLMs) often generate outputs that lack diversity. What drives this stability in the generation? We investigate this phenomenon through the lens of probability concentration in the model's output distribution. To quantify this concentration, we introduce the Branching Factor (BF) -- a token-invariant measure of the effective number of plausible next steps during generation. Our empirical analysis reveals two key findings: (1) BF often decreases as generation progresses, suggesting that LLMs become more predictable as they generate. (2) alignment tuning substantially sharpens the model's output distribution from the outset, reducing BF by nearly an order of magnitude (e.g., from 12 to 1.2) relative to base models. This stark reduction helps explain why aligned models often appear less sensitive to decoding strategies. Building on this insight, we find this stability has surprising implications for complex reasoning. Aligned Chain-of-Thought (CoT) models (e.g., DeepSeek-distilled models), for instance, leverage this effect; by generating longer reasoning chains, they push generation into later, more deterministic (lower BF) stages, resulting in more stable outputs. We hypothesize that alignment tuning does not fundamentally change a model's behavior, but instead steers it toward stylistic tokens (e.g., "Sure") that unlock low-entropy trajectories already present in the base model. This view is supported by nudging experiments, which show that prompting base models with such tokens can similarly reduce BF. Together, our findings establish BF as a powerful diagnostic for understanding and controlling LLM outputs - clarifying how alignment reduces variability, how CoT promotes stable generations, and how base models can be steered away from diversity.

對齊如何縮小生成視野

How Alignment Shrinks the Generative Horizon

摘要

Support