Esplorazione dei Modelli Linguistici di Grande Dimensione tramite Distillazione Latente

Abstract

La generazione di risposte diversificate è cruciale per lo scaling al momento del test dei grandi modelli linguistici (LLM), eppure il campionamento stocastico standard produce per lo più variazioni lessicali superficiali, limitando l'esplorazione semantica. In questo articolo, proponiamo l'Exploratory Sampling (ESamp), un approccio di decoding che incoraggia esplicitamente la diversità semantica durante la generazione. ESamp è motivato dalla ben nota osservazione che le reti neurali tendono a fare previsioni con errore minore su input simili a quelli incontrati in precedenza e a commettere un errore di previsione maggiore su input nuovi. Basandoci su questa proprietà, addestriamo un Distiller leggero al momento del test per prevedere le rappresentazioni latenti degli strati profondi dell'LLM a partire dalle sue rappresentazioni degli strati superficiali, modellando così le transizioni delle rappresentazioni lungo la profondità dell'LLM. Durante il decoding, il Distiller si adatta continuamente ai mapping indotti dal contesto di generazione corrente. ESamp utilizza l'errore di previsione come segnale di novità per ripesare le estensioni candidate dei token condizionate al prefisso corrente, orientando così il decoding verso pattern semantici meno esplorati. ESamp è implementato con una pipeline asincrona di addestramento-inferenza, con un overhead nel caso peggiore inferiore al 5% (1.2% nella release ottimizzata). I risultati empirici mostrano che ESamp aumenta significativamente l'efficienza Pass@k dei modelli di ragionamento, mostrando prestazioni superiori o comparabili a baseline stocastiche ed euristiche solide. In particolare, ESamp raggiunge una generalizzazione robusta su benchmark di matematica, scienze e generazione di codice e infrange il compromesso tra diversità e coerenza nella scrittura creativa. Il nostro codice è rilasciato all'indirizzo: https://github.com/LinesHogan/tLLM.

English

Generating diverse responses is crucial for test-time scaling of large language models (LLMs), yet standard stochastic sampling mostly yields surface-level lexical variation, limiting semantic exploration. In this paper, we propose Exploratory Sampling (ESamp), a decoding approach that explicitly encourages semantic diversity during generation. ESamp is motivated by the well-known observation that neural networks tend to make lower-error predictions on inputs similar to those encountered before, and incur higher prediction error on novel ones. Building on this property, we train a lightweight Distiller at test time to predict deep-layer hidden representations of the LLM from its shallow-layer representations to model the LLM's depth-wise representation transitions. During decoding, the Distiller continuously adapts to the mappings induced by the current generation context. ESamp uses the prediction error as a novelty signal to reweight candidate token extensions conditioned on the current prefix, thereby biasing decoding toward less-explored semantic patterns. ESamp is implemented with an asynchronous training--inference pipeline, with less than 5% worst case overhead (1.2% in the optimized release). Empirical results show that ESamp significantly boosts the Pass@k efficiency of reasoning models, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, ESamp achieves robust generalization across mathematics, science, and code generation benchmarks and breaks the trade-off between diversity and coherence in creative writing. Our code has released at: https://github.com/LinesHogan/tLLM.

Esplorazione dei Modelli Linguistici di Grande Dimensione tramite Distillazione Latente

Large Language Models Explore by Latent Distilling

Abstract

Support