再考せよ！テスト時の計算量が大規模言語モデルの選好、意見、信念に及ぼす影響

要旨

大規模言語モデル（LLMs）が人間の生活に深く統合され、意思決定にますます影響を与えるにつれ、これらのモデルが主観的な選好、意見、信念を示すかどうか、またその程度を評価することが重要である。これらの傾向は、モデル内のバイアスに起因する可能性があり、それらがモデルの行動を形成し、ユーザーに提供するアドバイスや推奨事項に影響を与え、特定の視点を強化する可能性がある。本論文では、社会的、文化的、倫理的、個人的な領域にわたるLLMsの主観的傾向を評価するために開発されたベンチマークである「選好、意見、信念調査（POBs）」を紹介する。我々はこのベンチマークを主要なオープンソースおよびクローズドソースのLLMsに適用し、信頼性、中立性、一貫性などの望ましい特性を測定した。さらに、推論と自己反映メカニズムを通じてテスト時の計算量を増やすことがこれらの指標に与える影響を調査した。他のタスクでは有効であるが、我々の結果は、これらのメカニズムが我々の領域では限定的な改善しかもたらさないことを示している。さらに、新しいモデルバージョンが一貫性を失い、特定の視点に偏りつつあることが明らかになり、盲点と懸念すべき傾向が浮き彫りになった。POBS: https://ibm.github.io/POBS

English

As Large Language Models (LLMs) become deeply integrated into human life and increasingly influence decision-making, it's crucial to evaluate whether and to what extent they exhibit subjective preferences, opinions, and beliefs. These tendencies may stem from biases within the models, which may shape their behavior, influence the advice and recommendations they offer to users, and potentially reinforce certain viewpoints. This paper presents the Preference, Opinion, and Belief survey (POBs), a benchmark developed to assess LLMs' subjective inclinations across societal, cultural, ethical, and personal domains. We applied our benchmark to evaluate leading open- and closed-source LLMs, measuring desired properties such as reliability, neutrality, and consistency. In addition, we investigated the effect of increasing the test-time compute, through reasoning and self-reflection mechanisms, on those metrics. While effective in other tasks, our results show that these mechanisms offer only limited gains in our domain. Furthermore, we reveal that newer model versions are becoming less consistent and more biased toward specific viewpoints, highlighting a blind spot and a concerning trend. POBS: https://ibm.github.io/POBS

再考せよ！テスト時の計算量が大規模言語モデルの選好、意見、信念に及ぼす影響

Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models

要旨

Support