開源大型語言模型在文本標註任務中優於群眾工作者並接近 ChatGPT 的表現

摘要

本研究探討開源大型語言模型（LLMs）在文本標註任務中的表現，並將其與像ChatGPT這樣的專有模型以及基於人力的服務（如MTurk）進行比較。儘管先前的研究顯示ChatGPT在眾多自然語言處理任務中表現出色，但像HugginChat和FLAN這樣的開源LLMs因其成本效益高、透明度高、可重現性強和優越的數據保護而受到關注。我們使用零編碼和少編碼方法以及不同溫度參數來評估這些模型在一系列文本標註任務中的表現。我們的研究結果顯示，儘管ChatGPT在大多數任務中表現最佳，但開源LLMs不僅在表現上優於MTurk，而且在特定任務中還展現出與ChatGPT競爭的潛力。

English

This study examines the performance of open-source Large Language Models (LLMs) in text annotation tasks and compares it with proprietary models like ChatGPT and human-based services such as MTurk. While prior research demonstrated the high performance of ChatGPT across numerous NLP tasks, open-source LLMs like HugginChat and FLAN are gaining attention for their cost-effectiveness, transparency, reproducibility, and superior data protection. We assess these models using both zero-shot and few-shot approaches and different temperature parameters across a range of text annotation tasks. Our findings show that while ChatGPT achieves the best performance in most tasks, open-source LLMs not only outperform MTurk but also demonstrate competitive potential against ChatGPT in specific tasks.

開源大型語言模型在文本標註任務中優於群眾工作者並接近 ChatGPT 的表現

Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks

摘要

Support