GPT-4是否通过了图灵测试？

摘要

我们在公开在线图灵测试中评估了GPT-4。在表现最佳的GPT-4提示中，有41% 的游戏通过，优于ELIZA（27%）和GPT-3.5（14%）设定的基准，但低于机会和人类参与者设定的基准（63%）。参与者的决策主要基于语言风格（35%）和社会情感特征（27%），支持智能并不足以通过图灵测试的观点。参与者的人口统计信息，包括教育程度和对大型语言模型的熟悉程度，并不能预测检测率，这表明即使是深入了解系统并经常与其互动的人也可能容易受骗。尽管作为智能测试的已知局限性，我们认为图灵测试作为自然交流和欺骗评估仍然具有相关性。具有伪装成人类能力的AI模型可能会产生广泛的社会影响，我们分析了不同策略和标准对人类相似性的评判效果。

English

We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%). Participants' decisions were based mainly on linguistic style (35%) and socio-emotional traits (27%), supporting the idea that intelligence is not sufficient to pass the Turing Test. Participants' demographics, including education and familiarity with LLMs, did not predict detection rate, suggesting that even those who understand systems deeply and interact with them frequently may be susceptible to deception. Despite known limitations as a test of intelligence, we argue that the Turing Test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.

GPT-4是否通过了图灵测试？

Does GPT-4 Pass the Turing Test?

摘要

Support