ChatPaper.aiChatPaper

SimpleStrat:通过分层实现语言模型生成的多样性

SimpleStrat: Diversifying Language Model Generation with Stratification

October 11, 2024
作者: Justin Wong, Yury Orlovskiy, Michael Luo, Sanjit A. Seshia, Joseph E. Gonzalez
cs.AI

摘要

从大型语言模型(LLMs)生成多样化的响应对于规划/搜索和合成数据生成等应用至关重要,因为多样性可以在生成之间提供不同的答案。先前的方法依赖于增加温度以增加多样性。然而,与普遍观念相反,我们展示了这种方法不仅会在温度增加时产生质量较低的单个生成,而且它取决于模型的下一个标记概率是否类似于真实答案分布。我们提出了一种替代方法,该方法利用语言模型本身将空间分割成层。在推断时,会选择一个随机层,并从该层内抽取一个样本。为了衡量多样性,我们引入了CoverageQA,这是一个包含多个同等合理答案的不明确问题数据集,并通过测量输出分布与有效真实答案之间的均匀分布之间的KL散度来评估多样性。由于计算专有模型每个响应/解决方案的概率是不可行的,我们通过测量对真实解决方案的召回率来评估。我们的评估显示,使用SimpleStrat相比于GPT-4o可以实现0.05的更高召回率,并且相比于Llama 3,KL散度平均减少了0.36。
English
Generating diverse responses from large language models (LLMs) is crucial for applications such as planning/search and synthetic data generation, where diversity provides distinct answers across generations. Prior approaches rely on increasing temperature to increase diversity. However, contrary to popular belief, we show not only does this approach produce lower quality individual generations as temperature increases, but it depends on model's next-token probabilities being similar to the true distribution of answers. We propose , an alternative approach that uses the language model itself to partition the space into strata. At inference, a random stratum is selected and a sample drawn from within the strata. To measure diversity, we introduce CoverageQA, a dataset of underspecified questions with multiple equally plausible answers, and assess diversity by measuring KL Divergence between the output distribution and uniform distribution over valid ground truth answers. As computing probability per response/solution for proprietary models is infeasible, we measure recall on ground truth solutions. Our evaluation show using SimpleStrat achieves higher recall by 0.05 compared to GPT-4o and 0.36 average reduction in KL Divergence compared to Llama 3.

Summary

AI-Generated Summary

PDF42November 16, 2024