GPT-4是否通过了图灵测试?
Does GPT-4 Pass the Turing Test?
October 31, 2023
作者: Cameron Jones, Benjamin Bergen
cs.AI
摘要
我们在公开在线图灵测试中评估了GPT-4。在表现最佳的GPT-4提示中,有41% 的游戏通过,优于ELIZA(27%)和GPT-3.5(14%)设定的基准,但低于机会和人类参与者设定的基准(63%)。参与者的决策主要基于语言风格(35%)和社会情感特征(27%),支持智能并不足以通过图灵测试的观点。参与者的人口统计信息,包括教育程度和对大型语言模型的熟悉程度,并不能预测检测率,这表明即使是深入了解系统并经常与其互动的人也可能容易受骗。尽管作为智能测试的已知局限性,我们认为图灵测试作为自然交流和欺骗评估仍然具有相关性。具有伪装成人类能力的AI模型可能会产生广泛的社会影响,我们分析了不同策略和标准对人类相似性的评判效果。
English
We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4
prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and
GPT-3.5 (14%), but falling short of chance and the baseline set by human
participants (63%). Participants' decisions were based mainly on linguistic
style (35%) and socio-emotional traits (27%), supporting the idea that
intelligence is not sufficient to pass the Turing Test. Participants'
demographics, including education and familiarity with LLMs, did not predict
detection rate, suggesting that even those who understand systems deeply and
interact with them frequently may be susceptible to deception. Despite known
limitations as a test of intelligence, we argue that the Turing Test continues
to be relevant as an assessment of naturalistic communication and deception. AI
models with the ability to masquerade as humans could have widespread societal
consequences, and we analyse the effectiveness of different strategies and
criteria for judging humanlikeness.