事前学習に植え付けられ、ファインチューニングによって揺さぶられる：LLMにおける認知バイアスの起源に関する事例研究

要旨

大規模言語モデル（LLMs）は、人間と同様の非合理的な意思決定の系統的傾向である認知バイアスを示す。先行研究では、これらのバイアスがモデル間で異なり、指示チューニングによって増幅されることが明らかになっている。しかし、これらのバイアスの違いが事前学習、ファインチューニング、あるいは学習の確率性に起因するランダムノイズに由来するのかは依然として不明である。本研究では、これらの要因を分離するための二段階の因果実験的アプローチを提案する。まず、異なるランダムシードを用いてモデルを複数回ファインチューニングし、学習のランダム性が30以上の認知バイアスにどのように影響するかを調査する。次に、クロスチューニングを導入し、異なるバイアスパターンを生じさせた指示データセットをモデル間で交換することで、バイアスの源を分離する。この交換は、バイアスがデータセットに依存するかどうかを直接検証するものである。我々の研究結果は、学習のランダム性が一部の変動を引き起こす一方で、バイアスは主に事前学習によって形成されることを明らかにした：同じ事前学習済みバックボーンを持つモデルは、ファインチューニングデータのみを共有するモデルよりも類似したバイアスパターンを示す。これらの知見は、ファインチューニングされたモデルのバイアスを理解するためには、ファインチューニング効果を超えてその事前学習の起源を考慮する必要があることを示唆している。この視点は、LLMsのバイアスを評価し軽減するための原則に基づいた戦略を開発する今後の取り組みを導くことができる。

English

Large language models (LLMs) exhibit cognitive biases -- systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across models and can be amplified by instruction tuning. However, it remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise due to training stochasticity. We propose a two-step causal experimental approach to disentangle these factors. First, we finetune models multiple times using different random seeds to study how training randomness affects over 30 cognitive biases. Second, we introduce cross-tuning -- swapping instruction datasets between models to isolate bias sources. This swap uses datasets that led to different bias patterns, directly testing whether biases are dataset-dependent. Our findings reveal that while training randomness introduces some variability, biases are mainly shaped by pretraining: models with the same pretrained backbone exhibit more similar bias patterns than those sharing only finetuning data. These insights suggest that understanding biases in finetuned models requires considering their pretraining origins beyond finetuning effects. This perspective can guide future efforts to develop principled strategies for evaluating and mitigating bias in LLMs.

事前学習に植え付けられ、ファインチューニングによって揺さぶられる：LLMにおける認知バイアスの起源に関する事例研究

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

要旨

Support