开源大型语言模型在文本注释任务中胜过众包工作者,并接近ChatGPT。
Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks
July 5, 2023
作者: Meysam Alizadeh, Maël Kubli, Zeynab Samei, Shirin Dehghani, Juan Diego Bermeo, Maria Korobeynikova, Fabrizio Gilardi
cs.AI
摘要
本研究考察了开源大型语言模型(LLMs)在文本标注任务中的表现,并将其与像ChatGPT这样的专有模型以及基于人力的服务(如MTurk)进行了比较。尽管先前的研究表明ChatGPT在许多自然语言处理任务中表现出色,但像HugginChat和FLAN这样的开源LLMs因其具有成本效益、透明度、可复现性和卓越的数据保护而备受关注。我们使用零次和少次训练方法以及不同的温度参数评估了这些模型在一系列文本标注任务中的表现。我们的研究结果显示,虽然ChatGPT在大多数任务中表现最佳,但开源LLMs不仅在表现上胜过MTurk,而且在特定任务中还展现出与ChatGPT竞争的潜力。
English
This study examines the performance of open-source Large Language Models
(LLMs) in text annotation tasks and compares it with proprietary models like
ChatGPT and human-based services such as MTurk. While prior research
demonstrated the high performance of ChatGPT across numerous NLP tasks,
open-source LLMs like HugginChat and FLAN are gaining attention for their
cost-effectiveness, transparency, reproducibility, and superior data
protection. We assess these models using both zero-shot and few-shot approaches
and different temperature parameters across a range of text annotation tasks.
Our findings show that while ChatGPT achieves the best performance in most
tasks, open-source LLMs not only outperform MTurk but also demonstrate
competitive potential against ChatGPT in specific tasks.