GPT-4 通過圖靈測試嗎？

摘要

我們在一項公開的線上圖靈測試中評估了 GPT-4。表現最佳的 GPT-4 提示在 41% 的遊戲中通過，優於 ELIZA（27%）和 GPT-3.5（14%）設定的基準，但仍不及機會和人類參與者（63%）設定的基準。參與者的決策主要基於語言風格（35%）和社會情感特徵（27%），支持智能並不足以通過圖靈測試的觀點。參與者的人口統計資料，包括教育程度和對大型語言模型的熟悉程度，並未預測檢測率，這表明即使是深入了解系統並經常與之互動的人也可能容易受騙。儘管圖靈測試作為智能測試存在已知限制，我們認為它仍然具有評估自然交流和欺騙的相關性。具備偽裝成人類能力的 AI 模型可能對社會產生廣泛影響，我們分析了不同策略和標準對人類相似性的評判效果。

English

We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%). Participants' decisions were based mainly on linguistic style (35%) and socio-emotional traits (27%), supporting the idea that intelligence is not sufficient to pass the Turing Test. Participants' demographics, including education and familiarity with LLMs, did not predict detection rate, suggesting that even those who understand systems deeply and interact with them frequently may be susceptible to deception. Despite known limitations as a test of intelligence, we argue that the Turing Test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.

GPT-4 通過圖靈測試嗎？

Does GPT-4 Pass the Turing Test?

摘要

Support