オープンソースの大規模言語モデルは、テキスト注釈タスクにおいてクラウドワーカーを上回り、ChatGPTに迫る性能を発揮する

要旨

本研究では、テキストアノテーションタスクにおけるオープンソースの大規模言語モデル（LLM）の性能を検証し、ChatGPTのようなプロプライエタリモデルやMTurkなどの人間ベースのサービスと比較する。先行研究では、ChatGPTが多くの自然言語処理（NLP）タスクで高い性能を発揮することが示されているが、HugginChatやFLANのようなオープンソースLLMは、コスト効率の良さ、透明性、再現性、優れたデータ保護の観点から注目を集めている。我々は、これらのモデルをゼロショットおよび少数ショットのアプローチとさまざまな温度パラメータを用いて、多様なテキストアノテーションタスクで評価した。その結果、ChatGPTがほとんどのタスクで最高の性能を達成する一方で、オープンソースLLMはMTurkを上回るだけでなく、特定のタスクではChatGPTに対しても競争力のある潜在能力を示すことが明らかになった。

English

This study examines the performance of open-source Large Language Models (LLMs) in text annotation tasks and compares it with proprietary models like ChatGPT and human-based services such as MTurk. While prior research demonstrated the high performance of ChatGPT across numerous NLP tasks, open-source LLMs like HugginChat and FLAN are gaining attention for their cost-effectiveness, transparency, reproducibility, and superior data protection. We assess these models using both zero-shot and few-shot approaches and different temperature parameters across a range of text annotation tasks. Our findings show that while ChatGPT achieves the best performance in most tasks, open-source LLMs not only outperform MTurk but also demonstrate competitive potential against ChatGPT in specific tasks.

オープンソースの大規模言語モデルは、テキスト注釈タスクにおいてクラウドワーカーを上回り、ChatGPTに迫る性能を発揮する

Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks

要旨

Support