再思考!测试时计算量对大型语言模型偏好、观点与信念的影响
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models
May 26, 2025
作者: George Kour, Itay Nakash, Ateret Anaby-Tavor, Michal Shmueli-Scheuer
cs.AI
摘要
随着大型语言模型(LLMs)深度融入人类生活,并日益影响决策过程,评估它们是否以及多大程度上展现出主观偏好、观点和信念变得至关重要。这些倾向可能源于模型内部的偏见,这些偏见不仅塑造了模型的行为,还影响了它们向用户提供的建议和推荐,甚至可能强化某些特定观点。本文介绍了偏好、观点与信念调查(POBs),这是一个旨在评估LLMs在社会、文化、伦理及个人领域主观倾向的基准测试。我们应用该基准对领先的开源和闭源LLMs进行了评估,衡量了诸如可靠性、中立性和一致性等期望属性。此外,我们还探讨了通过推理和自我反思机制增加测试时计算资源对这些指标的影响。尽管这些机制在其他任务中表现有效,但我们的结果显示,在本研究领域,它们带来的提升有限。更为重要的是,我们发现较新的模型版本在一致性上有所下降,且对特定观点的偏向性增强,这揭示了一个盲点和一个令人担忧的趋势。POBS详情请访问:https://ibm.github.io/POBS
English
As Large Language Models (LLMs) become deeply integrated into human life and
increasingly influence decision-making, it's crucial to evaluate whether and to
what extent they exhibit subjective preferences, opinions, and beliefs. These
tendencies may stem from biases within the models, which may shape their
behavior, influence the advice and recommendations they offer to users, and
potentially reinforce certain viewpoints. This paper presents the Preference,
Opinion, and Belief survey (POBs), a benchmark developed to assess LLMs'
subjective inclinations across societal, cultural, ethical, and personal
domains. We applied our benchmark to evaluate leading open- and closed-source
LLMs, measuring desired properties such as reliability, neutrality, and
consistency. In addition, we investigated the effect of increasing the
test-time compute, through reasoning and self-reflection mechanisms, on those
metrics. While effective in other tasks, our results show that these mechanisms
offer only limited gains in our domain. Furthermore, we reveal that newer model
versions are becoming less consistent and more biased toward specific
viewpoints, highlighting a blind spot and a concerning trend. POBS:
https://ibm.github.io/POBSSummary
AI-Generated Summary