再思考！测试时计算量对大语言模型偏好、观点及信念的影响

摘要

随着大型语言模型（LLMs）深度融入人类生活，并日益影响决策过程，评估其是否以及多大程度上展现出主观偏好、观点与信念变得至关重要。这些倾向可能源于模型内部的偏见，进而塑造其行为，影响其向用户提供的建议与推荐，并可能强化某些特定观点。本文介绍了偏好、观点与信念调查（POBs），这是一个旨在评估LLMs在社会、文化、伦理及个人领域主观倾向的基准。我们运用该基准对领先的开源与闭源LLMs进行了评估，衡量了诸如可靠性、中立性与一致性等期望属性。此外，我们还探讨了通过推理与自我反思机制增加测试时计算量对这些指标的影响。尽管这些机制在其他任务中有效，但我们的结果显示，在本研究领域中，它们仅带来有限的提升。进一步地，我们发现较新的模型版本在一致性上有所下降，且对特定观点的偏见有所增加，揭示了一个盲点及令人担忧的趋势。POBS详情请访问：https://ibm.github.io/POBS

English

As Large Language Models (LLMs) become deeply integrated into human life and increasingly influence decision-making, it's crucial to evaluate whether and to what extent they exhibit subjective preferences, opinions, and beliefs. These tendencies may stem from biases within the models, which may shape their behavior, influence the advice and recommendations they offer to users, and potentially reinforce certain viewpoints. This paper presents the Preference, Opinion, and Belief survey (POBs), a benchmark developed to assess LLMs' subjective inclinations across societal, cultural, ethical, and personal domains. We applied our benchmark to evaluate leading open- and closed-source LLMs, measuring desired properties such as reliability, neutrality, and consistency. In addition, we investigated the effect of increasing the test-time compute, through reasoning and self-reflection mechanisms, on those metrics. While effective in other tasks, our results show that these mechanisms offer only limited gains in our domain. Furthermore, we reveal that newer model versions are becoming less consistent and more biased toward specific viewpoints, highlighting a blind spot and a concerning trend. POBS: https://ibm.github.io/POBS

再思考！测试时计算量对大语言模型偏好、观点及信念的影响

Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models

摘要

Support