再思考!测试时计算量对大语言模型偏好、观点及信念的影响
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
May 26, 2025
作者: George Kour, Itay Nakash, Ateret Anaby-Tavor, Michal Shmueli-Scheuer
cs.AI
摘要
随着大型语言模型(LLMs)深度融入人类生活,并日益影响决策过程,评估其是否以及多大程度上展现出主观偏好、观点与信念变得至关重要。这些倾向可能源于模型内部的偏见,进而塑造其行为,影响其向用户提供的建议与推荐,并可能强化某些特定观点。本文介绍了偏好、观点与信念调查(POBs),这是一个旨在评估LLMs在社会、文化、伦理及个人领域主观倾向的基准。我们运用该基准对领先的开源与闭源LLMs进行了评估,衡量了诸如可靠性、中立性与一致性等期望属性。此外,我们还探讨了通过推理与自我反思机制增加测试时计算量对这些指标的影响。尽管这些机制在其他任务中有效,但我们的结果显示,在本研究领域中,它们仅带来有限的提升。进一步地,我们发现较新的模型版本在一致性上有所下降,且对特定观点的偏见有所增加,揭示了一个盲点及令人担忧的趋势。POBS详情请访问:https://ibm.github.io/POBS
English
As Large Language Models (LLMs) become deeply integrated into human life and
increasingly influence decision-making, it's crucial to evaluate whether and to
what extent they exhibit subjective preferences, opinions, and beliefs. These
tendencies may stem from biases within the models, which may shape their
behavior, influence the advice and recommendations they offer to users, and
potentially reinforce certain viewpoints. This paper presents the Preference,
Opinion, and Belief survey (POBs), a benchmark developed to assess LLMs'
subjective inclinations across societal, cultural, ethical, and personal
domains. We applied our benchmark to evaluate leading open- and closed-source
LLMs, measuring desired properties such as reliability, neutrality, and
consistency. In addition, we investigated the effect of increasing the
test-time compute, through reasoning and self-reflection mechanisms, on those
metrics. While effective in other tasks, our results show that these mechanisms
offer only limited gains in our domain. Furthermore, we reveal that newer model
versions are becoming less consistent and more biased toward specific
viewpoints, highlighting a blind spot and a concerning trend. POBS:
https://ibm.github.io/POBS