人間-計算アルゴリズムにおける労働者としてのLLM？ LLMを用いたクラウドソーシングパイプラインの再現

要旨

大規模言語モデル（LLM）は、従来は人間にしかできないと考えられていたクラウドソーシングタスクにおいて、人間らしい振る舞いを再現する可能性を示しています。しかし、現在の取り組みは主に単純なアトミックタスクに焦点を当てています。本研究では、LLMがより複雑なクラウドソーシングパイプラインを再現できるかどうかを探ります。その結果、現代のLLMは「人間計算アルゴリズム」においてクラウドワーカーの能力の一部をシミュレートできることがわかりましたが、成功の度合いは、リクエスタのLLM能力に対する理解、サブタスクに必要な特定のスキル、およびこれらのサブタスクを実行するための最適なインタラクションモダリティによって影響を受けます。我々は、人間とLLMの指示に対する感受性の違いを考察し、LLMに対する人間向けの安全策の重要性を強調し、人間とLLMを補完的なスキルセットで訓練する可能性について議論します。特に、クラウドソーシングパイプラインの再現は、(1) 異なるタスクにおけるLLMの相対的な強み（サブタスクでのパフォーマンスを相互比較することで）と、(2) 複雑なタスクにおけるLLMの潜在能力（タスクの一部を完了し、他の部分を人間に任せることで）を調査するための貴重なプラットフォームを提供することを示します。

English

LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate (1) the relative strengths of LLMs on different tasks (by cross-comparing their performances on sub-tasks) and (2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.

人間-計算アルゴリズムにおける労働者としてのLLM？ LLMを用いたクラウドソーシングパイプラインの再現

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

要旨

Support