人类还是非人类？图灵测试的一种游戏化方法

摘要

我们提出了一款名为“人还是非人？”的在线游戏，灵感来自图灵测试，用于衡量人工智能聊天机器人模仿人类对话的能力，以及人类区分机器人和其他人类的能力。在一个月的时间里，这款游戏吸引了超过150万用户参与，他们与另一名人类或者被提示要表现得像人类的人工智能语言模型进行了匿名的两分钟对话。玩家的任务是正确猜测他们是在与一个人类交谈还是与一个人工智能交谈。这是迄今为止规模最大的类图灵测试实验，揭示了一些有趣的事实。例如，总体而言，用户仅在68%的游戏中正确猜出了他们伙伴的身份。在用户面对人工智能机器人的游戏子集中，用户甚至只有60%的正确猜测率（即，几乎与随机猜测无异）。本白皮书详细介绍了这一独特实验的开发、部署和结果。虽然这一实验需要进行许多扩展和改进，但这些发现已经开始揭示人类和人工智能将不可避免地共存于不久的未来。

English

We present "Human or Not?", an online game inspired by the Turing test, that measures the capability of AI chatbots to mimic humans in dialog, and of humans to tell bots from other humans. Over the course of a month, the game was played by over 1.5 million users who engaged in anonymous two-minute chat sessions with either another human or an AI language model which was prompted to behave like humans. The task of the players was to correctly guess whether they spoke to a person or to an AI. This largest scale Turing-style test conducted to date revealed some interesting facts. For example, overall users guessed the identity of their partners correctly in only 68% of the games. In the subset of the games in which users faced an AI bot, users had even lower correct guess rates of 60% (that is, not much higher than chance). This white paper details the development, deployment, and results of this unique experiment. While this experiment calls for many extensions and refinements, these findings already begin to shed light on the inevitable near future which will commingle humans and AI.

人类还是非人类？图灵测试的一种游戏化方法

Human or Not? A Gamified Approach to the Turing Test

摘要

Support