Top-nσ: すべてのロジットが必要とは限らない

要旨

大規模言語モデル（LLMs）は通常、貪欲なデコーディングまたは低温度サンプリングを推論タスクに使用し、多様性と精度の間のトレードオフを反映しています。私たちは、統計的な閾値を活用してpre-softmax logits上で直接動作する新しいサンプリング方法であるtop-nsigmaを導入することで、この慣習に挑戦します。私たちの主要な洞察は、logitsが自然にガウス分布のノイズの多い領域と独自の情報のある領域に分離するため、複雑な確率操作なしで効率的なトークンフィルタリングが可能であることです。既存の方法（例：top-p、min-p）が高温度で誤ってより多くのノイズトークンを含むのに対し、top-nsigmaは温度スケーリングに関係なく安定したサンプリング空間を維持します。また、top-nsigmaの理論的分析を提供し、その振る舞いをよりよく理解します。4つの推論に焦点を当てたデータセット全体での幅広い実験結果は、当社の手法が既存のサンプリング手法を凌駕し、貪欲なデコーディングを上回ることを示し、高温度でも一貫した性能を維持していることを示しています。

English

Large language models (LLMs) typically employ greedy decoding or low-temperature sampling for reasoning tasks, reflecting a perceived trade-off between diversity and accuracy. We challenge this convention by introducing top-nsigma, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-p, min-p) that inadvertently include more noise tokens at higher temperatures, top-nsigma maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-nsigma to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.

Top-nσ: すべてのロジットが必要とは限らない

Top-nσ: Not All Logits Are You Need

要旨

Support