免费午餐属于Pass@k？扩散语言模型低成本多样化采样策略

摘要

在代码生成与数学解题等复杂推理任务中，文本生成的多样化输出对于有效探索解决方案至关重要。这类Pass@k问题需要覆盖解空间的不同候选方案才能发挥优势。然而，传统采样方法常因重复的失败模式而浪费计算资源。虽然扩散语言模型已成为主流自回归范式的有力替代方案，但它们仍难以避免这种冗余问题——独立样本往往会坍缩至相似模式。为此，我们提出一种无需训练、低成本的干预方法，以增强扩散语言模型的生成多样性。该方法通过顺序修改批次中的中间样本，使每个样本在特征空间中都与前序样本保持排斥，从而主动惩罚冗余。与需要重新训练或束搜索的现有方法不同，我们的策略仅产生可忽略的计算开销，同时确保每个样本都能为批次提供独特视角。基于LLaDA-8B-Instruct模型在HumanEval和GSM8K基准测试上的实验表明，我们的方法在不同温度设置下显著提升了多样性与Pass@k性能。作为对采样过程的简易改进，该方法能为当前及未来的扩散语言模型在需要多样化解搜索的任务中提供即时、低成本的性能提升。代码已开源：https://github.com/sean-lamont/odd。

English

Diverse outputs in text generation are necessary for effective exploration in complex reasoning tasks, such as code generation and mathematical problem solving. Such Pass@k problems benefit from distinct candidates covering the solution space. However, traditional sampling approaches often waste computational resources on repetitive failure modes. While Diffusion Language Models have emerged as a competitive alternative to the prevailing Autoregressive paradigm, they remain susceptible to this redundancy, with independent samples frequently collapsing into similar modes. To address this, we propose a training free, low cost intervention to enhance generative diversity in Diffusion Language Models. Our approach modifies intermediate samples in a batch sequentially, where each sample is repelled from the feature space of previous samples, actively penalising redundancy. Unlike prior methods that require retraining or beam search, our strategy incurs negligible computational overhead, while ensuring that each sample contributes a unique perspective to the batch. We evaluate our method on the HumanEval and GSM8K benchmarks using the LLaDA-8B-Instruct model. Our results demonstrate significantly improved diversity and Pass@k performance across various temperature settings. As a simple modification to the sampling process, our method offers an immediate, low-cost improvement for current and future Diffusion Language Models in tasks that benefit from diverse solution search. We make our code available at https://github.com/sean-lamont/odd.