再思考！测试时计算量对大型语言模型偏好、观点与信念的影响

摘要

随着大型语言模型（LLMs）深度融入人类生活，并日益影响决策过程，评估它们是否以及多大程度上展现出主观偏好、观点和信念变得至关重要。这些倾向可能源于模型内部的偏见，这些偏见不仅塑造了模型的行为，还影响了它们向用户提供的建议和推荐，甚至可能强化某些特定观点。本文介绍了偏好、观点与信念调查（POBs），这是一个旨在评估LLMs在社会、文化、伦理及个人领域主观倾向的基准测试。我们应用该基准对领先的开源和闭源LLMs进行了评估，衡量了诸如可靠性、中立性和一致性等期望属性。此外，我们还探讨了通过推理和自我反思机制增加测试时计算资源对这些指标的影响。尽管这些机制在其他任务中表现有效，但我们的结果显示，在本研究领域，它们带来的提升有限。更为重要的是，我们发现较新的模型版本在一致性上有所下降，且对特定观点的偏向性增强，这揭示了一个盲点和一个令人担忧的趋势。POBS详情请访问：https://ibm.github.io/POBS

English

As Large Language Models (LLMs) become deeply integrated into human life and increasingly influence decision-making, it's crucial to evaluate whether and to what extent they exhibit subjective preferences, opinions, and beliefs. These tendencies may stem from biases within the models, which may shape their behavior, influence the advice and recommendations they offer to users, and potentially reinforce certain viewpoints. This paper presents the Preference, Opinion, and Belief survey (POBs), a benchmark developed to assess LLMs' subjective inclinations across societal, cultural, ethical, and personal domains. We applied our benchmark to evaluate leading open- and closed-source LLMs, measuring desired properties such as reliability, neutrality, and consistency. In addition, we investigated the effect of increasing the test-time compute, through reasoning and self-reflection mechanisms, on those metrics. While effective in other tasks, our results show that these mechanisms offer only limited gains in our domain. Furthermore, we reveal that newer model versions are becoming less consistent and more biased toward specific viewpoints, highlighting a blind spot and a concerning trend. POBS: https://ibm.github.io/POBS

再思考！测试时计算量对大型语言模型偏好、观点与信念的影响

Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models

摘要

Support