SimpleStrat:通過分層實現語言模型生成的多樣性
SimpleStrat: Diversifying Language Model Generation with Stratification
October 11, 2024
作者: Justin Wong, Yury Orlovskiy, Michael Luo, Sanjit A. Seshia, Joseph E. Gonzalez
cs.AI
摘要
從大型語言模型(LLMs)生成多樣化回應對於規劃/搜索和合成數據生成等應用至關重要,其中多樣性能夠在不同生成中提供獨特答案。先前的方法依賴於增加溫度以提高多樣性。然而,與普遍觀念相反,我們不僅顯示這種方法在溫度增加時會產生質量較低的單個生成,而且還取決於模型的下一令牌概率與真實答案分佈相似。我們提出一種替代方法,該方法利用語言模型本身將空間劃分為層。在推論時,會隨機選擇一個層,並從該層中抽取樣本。為了衡量多樣性,我們引入了CoverageQA,這是一個包含多個同等合理答案的不明確問題數據集,通過測量輸出分佈與有效基本真實答案之間的KL散度來評估多樣性。由於對專有模型計算每個回應/解決方案的概率是不可行的,因此我們通過測量對基本真實解決方案的召回率來進行評估。我們的評估結果顯示,使用SimpleStrat相對於GPT-4o的召回率提高了0.05,相對於Llama 3的KL散度平均減少了0.36。
English
Generating diverse responses from large language models (LLMs) is crucial for
applications such as planning/search and synthetic data generation, where
diversity provides distinct answers across generations. Prior approaches rely
on increasing temperature to increase diversity. However, contrary to popular
belief, we show not only does this approach produce lower quality individual
generations as temperature increases, but it depends on model's next-token
probabilities being similar to the true distribution of answers. We propose
, an alternative approach that uses the language model itself to
partition the space into strata. At inference, a random stratum is selected and
a sample drawn from within the strata. To measure diversity, we introduce
CoverageQA, a dataset of underspecified questions with multiple equally
plausible answers, and assess diversity by measuring KL Divergence between the
output distribution and uniform distribution over valid ground truth answers.
As computing probability per response/solution for proprietary models is
infeasible, we measure recall on ground truth solutions. Our evaluation show
using SimpleStrat achieves higher recall by 0.05 compared to GPT-4o and 0.36
average reduction in KL Divergence compared to Llama 3.Summary
AI-Generated Summary