超越离散標本採樣的文本生成
Text Generation Beyond Discrete Token Sampling
May 20, 2025
作者: Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao
cs.AI
摘要
在標準的自回歸生成過程中,大型語言模型(LLM)預測下一個詞元的分布,採樣一個離散的詞元,然後丟棄該分布,僅將採樣的詞元作為新的輸入傳遞。為了保留這一分布的豐富信息,我們提出了輸入混合(Mixture of Inputs, MoI),這是一種無需訓練的自回歸生成方法。在按照標準範式生成一個詞元後,我們構建一個新的輸入,將生成的離散詞元與先前丟棄的詞元分布相結合。具體而言,我們採用了一種貝葉斯估計方法,將詞元分布視為先驗,採樣的詞元作為觀測值,並用連續的後驗期望替代傳統的獨熱向量作為新的模型輸入。MoI使模型在整個生成過程中能夠保持更豐富的內部表示,從而提升文本質量和推理能力。在數學推理、代碼生成和博士級問答任務中,MoI在多個模型(包括QwQ-32B、Nemotron-Super-49B、Gemma-3-27B和DAPO-Qwen-32B)上均能持續提升性能,且無需額外訓練,計算開銷可忽略不計。
English
In standard autoregressive generation, an LLM predicts the next-token
distribution, samples a discrete token, and then discards the distribution,
passing only the sampled token as new input. To preserve this distribution's
rich information, we propose Mixture of Inputs (MoI), a training-free method
for autoregressive generation. After generating a token following the standard
paradigm, we construct a new input that blends the generated discrete token
with the previously discarded token distribution. Specifically, we employ a
Bayesian estimation method that treats the token distribution as the prior, the
sampled token as the observation, and replaces the conventional one-hot vector
with the continuous posterior expectation as the new model input. MoI allows
the model to maintain a richer internal representation throughout the
generation process, resulting in improved text quality and reasoning
capabilities. On mathematical reasoning, code generation, and PhD-level QA
tasks, MoI consistently improves performance across multiple models including
QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B, with no additional
training and negligible computational overhead.Summary
AI-Generated Summary