GPT-4 supera il test di Turing?

Abstract

Abbiamo valutato GPT-4 in un test di Turing pubblico online. Il prompt di GPT-4 con le migliori prestazioni ha superato il test nel 41% delle partite, superando i benchmark stabiliti da ELIZA (27%) e GPT-3.5 (14%), ma rimanendo al di sotto del caso casuale e del benchmark stabilito dai partecipanti umani (63%). Le decisioni dei partecipanti si sono basate principalmente sullo stile linguistico (35%) e sui tratti socio-emotivi (27%), supportando l'idea che l'intelligenza non sia sufficiente per superare il test di Turing. I dati demografici dei partecipanti, inclusi istruzione e familiarità con i modelli linguistici di grandi dimensioni (LLM), non hanno predetto il tasso di rilevamento, suggerendo che anche coloro che comprendono a fondo i sistemi e interagiscono frequentemente con essi possano essere suscettibili all'inganno. Nonostante i limiti noti come test di intelligenza, sosteniamo che il test di Turing continui a essere rilevante come valutazione della comunicazione naturalistica e dell'inganno. I modelli di IA con la capacità di mascherarsi da esseri umani potrebbero avere conseguenze sociali diffuse, e analizziamo l'efficacia di diverse strategie e criteri per giudicare la somiglianza umana.

English

We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%). Participants' decisions were based mainly on linguistic style (35%) and socio-emotional traits (27%), supporting the idea that intelligence is not sufficient to pass the Turing Test. Participants' demographics, including education and familiarity with LLMs, did not predict detection rate, suggesting that even those who understand systems deeply and interact with them frequently may be susceptible to deception. Despite known limitations as a test of intelligence, we argue that the Turing Test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.

GPT-4 supera il test di Turing?

Does GPT-4 Pass the Turing Test?

Abstract

Support