大型语言模型假设人类比我们实际更理性。
Large Language Models Assume People are More Rational than We Really are
June 24, 2024
作者: Ryan Liu, Jiayi Geng, Joshua C. Peterson, Ilia Sucholutsky, Thomas L. Griffiths
cs.AI
摘要
为了使人工智能系统能够有效地与人类沟通,它们必须理解我们做决策的方式。然而,人类的决策并非总是理性的,因此大型语言模型(LLMs)中的隐含内部人类决策模型必须考虑到这一点。先前的实证证据似乎表明这些隐含模型是准确的 -- LLMs提供了人类行为的可信代理,表现出我们在日常互动中对人类的期望。然而,通过将LLM的行为和预测与大量人类决策数据集进行比较,我们发现实际情况并非如此:在模拟和预测人们的选择时,一系列尖端LLMs(如GPT-4o和4-Turbo,Llama-3-8B和70B,Claude 3 Opus)假设人们比实际更理性。具体而言,这些模型偏离了人类行为,更接近于经典的理性选择模型 -- 预期价值理论。有趣的是,人们在解释他人行为时也倾向于假设他人是理性的。因此,当我们使用另一个心理数据集比较LLMs和人们从他人决策中得出的推断时,我们发现这些推断高度相关。因此,LLMs的隐含决策模型似乎与人类期望他人会理性行事的预期一致,而不是与人们实际行为一致。
English
In order for AI systems to communicate effectively with people, they must
understand how we make decisions. However, people's decisions are not always
rational, so the implicit internal models of human decision-making in Large
Language Models (LLMs) must account for this. Previous empirical evidence seems
to suggest that these implicit models are accurate -- LLMs offer believable
proxies of human behavior, acting how we expect humans would in everyday
interactions. However, by comparing LLM behavior and predictions to a large
dataset of human decisions, we find that this is actually not the case: when
both simulating and predicting people's choices, a suite of cutting-edge LLMs
(GPT-4o & 4-Turbo, Llama-3-8B & 70B, Claude 3 Opus) assume that people are more
rational than we really are. Specifically, these models deviate from human
behavior and align more closely with a classic model of rational choice --
expected value theory. Interestingly, people also tend to assume that other
people are rational when interpreting their behavior. As a consequence, when we
compare the inferences that LLMs and people draw from the decisions of others
using another psychological dataset, we find that these inferences are highly
correlated. Thus, the implicit decision-making models of LLMs appear to be
aligned with the human expectation that other people will act rationally,
rather than with how people actually act.Summary
AI-Generated Summary