大型语言模型通过潜在蒸馏进行探索

摘要

生成多样化响应对于大语言模型（LLM）的测试时扩展至关重要，然而标准的随机采样大多仅产生表层词汇变化，限制了语义探索。本文提出探索性采样（ESamp），一种在生成过程中显式鼓励语义多样性的解码方法。ESamp的动机源于一个经典观察：神经网络在面对与既往输入相似的样本时预测误差较低，而在处理新颖输入时预测误差较高。基于此特性，我们在测试时训练一个轻量级蒸馏器，通过LLM浅层表征预测其深层隐藏表征，从而建模LLM的深度方向表征转换。解码过程中，蒸馏器持续适配当前生成上下文诱导的映射关系。ESamp将预测误差作为新颖性信号，对基于当前前缀的候选词扩展进行重加权，使解码偏向于较少探索的语义模式。该方法通过异步训练-推理流水线实现，最坏情况下开销低于5%（优化版本为1.2%）。实验结果表明，ESamp显著提升推理模型的Pass@k效率，在强随机基线和启发式基线中表现优异或相当。值得注意的是，ESamp在数学、科学和代码生成基准测试中展现稳健的泛化能力，并打破了创意写作中多样性与连贯性的权衡。代码已发布于：https://github.com/LinesHogan/tLLM。

English

Generating diverse responses is crucial for test-time scaling of large language models (LLMs), yet standard stochastic sampling mostly yields surface-level lexical variation, limiting semantic exploration. In this paper, we propose Exploratory Sampling (ESamp), a decoding approach that explicitly encourages semantic diversity during generation. ESamp is motivated by the well-known observation that neural networks tend to make lower-error predictions on inputs similar to those encountered before, and incur higher prediction error on novel ones. Building on this property, we train a lightweight Distiller at test time to predict deep-layer hidden representations of the LLM from its shallow-layer representations to model the LLM's depth-wise representation transitions. During decoding, the Distiller continuously adapts to the mappings induced by the current generation context. ESamp uses the prediction error as a novelty signal to reweight candidate token extensions conditioned on the current prefix, thereby biasing decoding toward less-explored semantic patterns. ESamp is implemented with an asynchronous training--inference pipeline, with less than 5% worst case overhead (1.2% in the optimized release). Empirical results show that ESamp significantly boosts the Pass@k efficiency of reasoning models, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, ESamp achieves robust generalization across mathematics, science, and code generation benchmarks and breaks the trade-off between diversity and coherence in creative writing. Our code has released at: https://github.com/LinesHogan/tLLM.

大型语言模型通过潜在蒸馏进行探索

Large Language Models Explore by Latent Distilling

摘要

Support