言語化サンプリング：モード崩壊の緩和とLLM多様性の解放方法

要旨

学習後のアラインメントは、しばしば大規模言語モデル（LLM）の多様性を減少させ、モード崩壊として知られる現象を引き起こす。従来の研究では、この効果をアルゴリズムの制約に帰着させてきたが、本研究では、認知心理学における確立された知見に基づき、アノテーターが慣れ親しんだテキストを体系的に選好するという、データレベルでの根本的かつ普遍的な要因を特定する。このバイアスを理論的に形式化し、選好データセット上で実証的に検証し、モード崩壊において中心的な役割を果たすことを示す。この分析に基づき、モード崩壊を回避するためのシンプルで学習不要なプロンプト戦略である「Verbalized Sampling（VS）」を提案する。VSは、モデルに対して一連の応答に対する確率分布を言語化するよう促す（例：「コーヒーに関するジョークを5つ生成し、それぞれの確率を示してください」）。包括的な実験により、VSが創造的執筆（詩、物語、ジョーク）、対話シミュレーション、オープンエンド質問応答、および合成データ生成において、事実の正確性や安全性を損なうことなく、性能を大幅に向上させることが示された。例えば、創造的執筆において、VSは直接プロンプトと比較して多様性を1.6～2.1倍向上させる。さらに、能力の高いモデルほどVSの恩恵をより大きく受けるという新たな傾向も観察された。総じて、本研究はモード崩壊に対する新たなデータ中心の視点と、事前学習済み生成モデルの多様性を引き出すための実践的な推論時対策を提供する。

English

Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology. We formalize this bias theoretically, verify it on preference datasets empirically, and show that it plays a central role in mode collapse. Motivated by this analysis, we introduce Verbalized Sampling, a simple, training-free prompting strategy to circumvent mode collapse. VS prompts the model to verbalize a probability distribution over a set of responses (e.g., ``Generate 5 jokes about coffee and their corresponding probabilities''). Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety. For instance, in creative writing, VS increases diversity by 1.6-2.1x over direct prompting. We further observe an emergent trend that more capable models benefit more from VS. In sum, our work provides a new data-centric perspective on mode collapse and a practical inference-time remedy that helps unlock pre-trained generative diversity.

言語化サンプリング：モード崩壊の緩和とLLM多様性の解放方法

Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity

要旨

Support