確率単体上の最適化としてのデコーディング：Top-KからTop-P（核）そしてBest-of-Kサンプラーへ

要旨

デコーディングは言語モデルとその応用の間に位置する重要な要素であるにもかかわらず、未だに経験的なパラメータ調整作業として扱われがちです。我々はデコーディングを原理に基づく最適化層として捉えるべきだと主張します。つまり、各トークン生成時に、モデルスコアと構造的選好性/制約のトレードオフを図る正則化問題を確率単体上で解決するという枠組みです。この統一テンプレートは、貪欲法デコーディング、Softmaxサンプリング、Top-K、Top-P、Sparsemax型スパース性などを特殊ケースとして包含し、最適性条件を通じてそれらの共通構造を説明します。さらに重要なのは、この枠組みが新たなデコーダを経験則に頼らず設計できる点です。これを実証するため、マルチサンプルパイプライン（自己一貫性、再ランキング、検証器選択）向けにKLダイバージェンスを基準としたカバレッジ目的関数であるBest-of-K（BoK）を設計しました。BoKは固定Kサンプル予算内で良好な代替案をカバーする確率を最大化し、実証的な性能向上をもたらします。例えば高サンプリング温度条件下では、Qwen2.5-Math-7BのMATH500における精度を+18.6%改善できることを示しました。

English

Decoding sits between a language model and everything we do with it, yet it is still treated as a heuristic knob-tuning exercise. We argue decoding should be understood as a principled optimisation layer: at each token, we solve a regularised problem over the probability simplex that trades off model score against structural preferences and constraints. This single template recovers greedy decoding, Softmax sampling, Top-K, Top-P, and Sparsemax-style sparsity as special cases, and explains their common structure through optimality conditions. More importantly, the framework makes it easy to invent new decoders without folklore. We demonstrate this by designing Best-of-K (BoK), a KL-anchored coverage objective aimed at multi-sample pipelines (self-consistency, reranking, verifier selection). BoK targets the probability of covering good alternatives within a fixed K-sample budget and improves empirical performance. We show that such samples can improve accuracy by, for example, +18.6% for Qwen2.5-Math-7B on MATH500 at high sampling temperatures.

確率単体上の最適化としてのデコーディング：Top-KからTop-P（核）そしてBest-of-Kサンプラーへ

Decoding as Optimisation on the Probability Simplex: From Top-K to Top-P (Nucleus) to Best-of-K Samplers

要旨

Support