大規模言語モデルによる潜在蒸留を用いた探索

要旨

多様な応答生成は大規模言語モデル（LLM）のテスト時スケーリングにおいて重要であるが、標準的な確率的サンプリングでは表層的な語彙のバリエーションが生じるだけで、意味的な探索が制限されてしまう。本論文では、生成時に意味的な多様性を明示的に促進するデコーディング手法として探索的サンプリング（ESamp）を提案する。ESampは、ニューラルネットワークが既知の入力に類似した場合には予測誤差が低く、新奇な入力では予測誤差が高くなるという既知の観察事実に基づいている。この特性を踏まえ、テスト時に軽量な蒸留器（Distiller）を訓練し、LLMの深層隠れ表現を浅層表現から予測することで、LLMの深度方向の表現遷移をモデル化する。デコーディング時には、蒸留器は現在の生成コンテキストが誘導する写像に継続的に適応する。ESampはこの予測誤差を新規性信号として利用し、現在のプレフィックスに条件付けられた候補トークン拡張の重みを再調整することで、十分に探索されていない意味パターンに向けたデコーディングを偏倚させる。ESampは非同期の訓練・推論パイプラインで実装され、最悪ケースでも5％未満（最適化版では1.2％）のオーバーヘッドに抑えられている。実験結果では、ESampが推論モデルのPass@k効率を大幅に向上させ、強力な確率的・ヒューリスティックなベースライン手法に対して優れたあるいは同等の性能を示す。特に、数学・科学・コード生成ベンチマークで頑健な一般化性能を発揮し、創造的作文における多様性と一貫性のトレードオフを打破する。実装コードはhttps://github.com/LinesHogan/tLLM で公開されている。

English

Generating diverse responses is crucial for test-time scaling of large language models (LLMs), yet standard stochastic sampling mostly yields surface-level lexical variation, limiting semantic exploration. In this paper, we propose Exploratory Sampling (ESamp), a decoding approach that explicitly encourages semantic diversity during generation. ESamp is motivated by the well-known observation that neural networks tend to make lower-error predictions on inputs similar to those encountered before, and incur higher prediction error on novel ones. Building on this property, we train a lightweight Distiller at test time to predict deep-layer hidden representations of the LLM from its shallow-layer representations to model the LLM's depth-wise representation transitions. During decoding, the Distiller continuously adapts to the mappings induced by the current generation context. ESamp uses the prediction error as a novelty signal to reweight candidate token extensions conditioned on the current prefix, thereby biasing decoding toward less-explored semantic patterns. ESamp is implemented with an asynchronous training--inference pipeline, with less than 5% worst case overhead (1.2% in the optimized release). Empirical results show that ESamp significantly boosts the Pass@k efficiency of reasoning models, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, ESamp achieves robust generalization across mathematics, science, and code generation benchmarks and breaks the trade-off between diversity and coherence in creative writing. Our code has released at: https://github.com/LinesHogan/tLLM.

大規模言語モデルによる潜在蒸留を用いた探索

Large Language Models Explore by Latent Distilling

要旨

Support