超越离散標本採樣的文本生成

摘要

在標準的自回歸生成過程中，大型語言模型（LLM）預測下一個詞元的分布，採樣一個離散的詞元，然後丟棄該分布，僅將採樣的詞元作為新的輸入傳遞。為了保留這一分布的豐富信息，我們提出了輸入混合（Mixture of Inputs, MoI），這是一種無需訓練的自回歸生成方法。在按照標準範式生成一個詞元後，我們構建一個新的輸入，將生成的離散詞元與先前丟棄的詞元分布相結合。具體而言，我們採用了一種貝葉斯估計方法，將詞元分布視為先驗，採樣的詞元作為觀測值，並用連續的後驗期望替代傳統的獨熱向量作為新的模型輸入。MoI使模型在整個生成過程中能夠保持更豐富的內部表示，從而提升文本質量和推理能力。在數學推理、代碼生成和博士級問答任務中，MoI在多個模型（包括QwQ-32B、Nemotron-Super-49B、Gemma-3-27B和DAPO-Qwen-32B）上均能持續提升性能，且無需額外訓練，計算開銷可忽略不計。

English

In standard autoregressive generation, an LLM predicts the next-token distribution, samples a discrete token, and then discards the distribution, passing only the sampled token as new input. To preserve this distribution's rich information, we propose Mixture of Inputs (MoI), a training-free method for autoregressive generation. After generating a token following the standard paradigm, we construct a new input that blends the generated discrete token with the previously discarded token distribution. Specifically, we employ a Bayesian estimation method that treats the token distribution as the prior, the sampled token as the observation, and replaces the conventional one-hot vector with the continuous posterior expectation as the new model input. MoI allows the model to maintain a richer internal representation throughout the generation process, resulting in improved text quality and reasoning capabilities. On mathematical reasoning, code generation, and PhD-level QA tasks, MoI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B, with no additional training and negligible computational overhead.

超越离散標本採樣的文本生成

Text Generation Beyond Discrete Token Sampling

摘要

Support