LLM's als werkers in mens-computationele algoritmen? Het repliceren van crowdsourcing-pipelines met LLM's

Samenvatting

LLM's hebben potentieel getoond in het nabootsen van menselijk gedrag bij crowdsourcingtaken die voorheen als exclusief menselijk werden beschouwd. Huidige inspanningen richten zich echter voornamelijk op eenvoudige, atomische taken. Wij onderzoeken of LLM's complexere crowdsourcingpijplijnen kunnen repliceren. We ontdekken dat moderne LLM's sommige vaardigheden van crowdworkers kunnen simuleren in deze "menselijke rekenalgoritmen," maar het succesniveau varieert en wordt beïnvloed door de kennis van opdrachtgevers over de mogelijkheden van LLM's, de specifieke vaardigheden die nodig zijn voor subtaken, en de optimale interactiemodaliteit voor het uitvoeren van deze subtaken. We reflecteren op de verschillende gevoeligheden van mensen en LLM's voor instructies, benadrukken het belang van mensgerichte veiligheidsmaatregelen voor LLM's, en bespreken de mogelijkheid om mensen en LLM's te trainen met complementaire vaardigheden. Cruciaal is dat we aantonen dat het repliceren van crowdsourcingpijplijnen een waardevol platform biedt om (1) de relatieve sterktes van LLM's op verschillende taken te onderzoeken (door hun prestaties op subtaken onderling te vergelijken) en (2) het potentieel van LLM's in complexe taken, waarbij ze een deel van de taken kunnen voltooien en andere aan mensen overlaten.

English

LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate (1) the relative strengths of LLMs on different tasks (by cross-comparing their performances on sub-tasks) and (2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.

LLM's als werkers in mens-computationele algoritmen? Het repliceren van crowdsourcing-pipelines met LLM's

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

Samenvatting

Support