圧縮は真実よりも一貫性を好む：言語モデルが正しい情報を選好する条件と理由

要旨

言語モデルが混合品質のデータで学習されているにも関わらず、なぜ正しい記述を好む傾向を示すのか？本研究では「圧縮-一貫性原理」を提唱する：次のトークン予測は、学習データをより短く、内部的に一貫した記述で説明できる仮説を優先する。真実バイアスは、誤った選択肢が構造的に圧縮困難な場合にのみ現れる。この仮説を検証するため、GPT-2スタイルの小規模文字レベルトランスフォーマー（350万～8600万パラメータ）を用い、正誤規則の混合比率を制御した合成数学コーパスで実験を行った。ランダム誤り設定では、モデルは正しい補完を強く優先した：データ比率が均等な場合の正答率は83.1%、正規則がコーパスのわずか10%の場合でも67.0%を示した。一方、ランダム誤りを数学的に誤った首尾一貫した規則体系に置き換えると、正答率はほぼ偶然レベルに低下した。より自然言語に近い合成世界では効果は弱まるものの（57.7%）、依然として確認された。追加実験では、埋め込み検証ステップによって小規模モデルでも正しさへの選好が回復すること、一貫性規則の増加に伴い正答率が段階的に向上することを示した。これらの結果は、「真実バイアス」として観察される現象が、本質的な真実指向ではなく、圧縮圧力と内的整合性への選好の副次的効果であることを示唆する。コードとデータはhttps://github.com/Rai220/compression-drives-truth で公開されている。

English

Why do language models sometimes prefer correct statements even when trained on mixed-quality data? We introduce the Compression--Consistency Principle: next-token prediction favors hypotheses that allow shorter and more internally consistent descriptions of the training data. Truth bias emerges only when false alternatives are structurally harder to compress. We test this using small GPT-2-style character-level transformers (3.5M--86M parameters) on synthetic math corpora with controlled mixtures of correct and incorrect rules. In the random-error setting, models strongly prefer correct completions in paired evaluation: 83.1% accuracy at balanced data and 67.0% even when correct rules appear in only 10% of the corpus. Replacing random errors with a coherent but mathematically incorrect rule system largely eliminates the preference (near-chance accuracy). In a more natural-language-like synthetic world, the effect is weaker but still present (57.7%). Additional experiments show that embedding verification steps can restore preference for correctness even at small scale, while increasing the number of consistent rules produces a graded improvement in accuracy. Our results suggest that what appears as a "truth bias" is largely a side effect of compression pressure and preference for internal consistency, rather than an intrinsic drive toward truth. Full code and data are available at https://github.com/Rai220/compression-drives-truth.

圧縮は真実よりも一貫性を好む：言語モデルが正しい情報を選好する条件と理由

Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

要旨

Support