人類還是非人類？一種基於遊戲化的圖靈測試方法

摘要

我們介紹了一款名為「人還是非人？」的線上遊戲，靈感來自圖靈測試，用於評估人工智慧聊天機器人模仿人類對話的能力，以及人類辨別機器人和其他人類的能力。在一個月的時間裡，這款遊戲吸引了超過150萬用戶參與，他們進行了與另一個人類或被提示要像人類一樣行為的人工智慧語言模型進行的匿名兩分鐘對話。玩家的任務是正確猜測他們是在與一個人還是一個人工智慧對話。這是迄今為止規模最大的圖靈式測試，揭示了一些有趣的事實。例如，整體用戶僅在68%的遊戲中正確猜測出他們對話對象的身份。在用戶面對人工智慧機器人的遊戲子集中，用戶甚至更低的正確猜測率為60%（即與隨機猜測差不多）。這份白皮書詳細介紹了這一獨特實驗的開發、部署和結果。雖然這個實驗需要許多擴展和改進，但這些發現已經開始揭示將人類和人工智慧混合在一起的不可避免的不久將來。

English

We present "Human or Not?", an online game inspired by the Turing test, that measures the capability of AI chatbots to mimic humans in dialog, and of humans to tell bots from other humans. Over the course of a month, the game was played by over 1.5 million users who engaged in anonymous two-minute chat sessions with either another human or an AI language model which was prompted to behave like humans. The task of the players was to correctly guess whether they spoke to a person or to an AI. This largest scale Turing-style test conducted to date revealed some interesting facts. For example, overall users guessed the identity of their partners correctly in only 68% of the games. In the subset of the games in which users faced an AI bot, users had even lower correct guess rates of 60% (that is, not much higher than chance). This white paper details the development, deployment, and results of this unique experiment. While this experiment calls for many extensions and refinements, these findings already begin to shed light on the inevitable near future which will commingle humans and AI.

人類還是非人類？一種基於遊戲化的圖靈測試方法

Human or Not? A Gamified Approach to the Turing Test

摘要

Support