开源大型语言模型在文本注释任务中胜过众包工作者，并接近ChatGPT。

摘要

本研究考察了开源大型语言模型（LLMs）在文本标注任务中的表现，并将其与像ChatGPT这样的专有模型以及基于人力的服务（如MTurk）进行了比较。尽管先前的研究表明ChatGPT在许多自然语言处理任务中表现出色，但像HugginChat和FLAN这样的开源LLMs因其具有成本效益、透明度、可复现性和卓越的数据保护而备受关注。我们使用零次和少次训练方法以及不同的温度参数评估了这些模型在一系列文本标注任务中的表现。我们的研究结果显示，虽然ChatGPT在大多数任务中表现最佳，但开源LLMs不仅在表现上胜过MTurk，而且在特定任务中还展现出与ChatGPT竞争的潜力。

English

This study examines the performance of open-source Large Language Models (LLMs) in text annotation tasks and compares it with proprietary models like ChatGPT and human-based services such as MTurk. While prior research demonstrated the high performance of ChatGPT across numerous NLP tasks, open-source LLMs like HugginChat and FLAN are gaining attention for their cost-effectiveness, transparency, reproducibility, and superior data protection. We assess these models using both zero-shot and few-shot approaches and different temperature parameters across a range of text annotation tasks. Our findings show that while ChatGPT achieves the best performance in most tasks, open-source LLMs not only outperform MTurk but also demonstrate competitive potential against ChatGPT in specific tasks.

开源大型语言模型在文本注释任务中胜过众包工作者，并接近ChatGPT。

Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks

摘要

Support