오픈소스 대형 언어 모델이 텍스트 주석 작업에서 크라우드 워커를 능가하고 ChatGPT에 근접한 성능을 보임

초록

본 연구는 오픈소스 대형 언어 모델(LLM)의 텍스트 주석 작업 성능을 조사하고 이를 ChatGPT와 같은 상용 모델 및 MTurk와 같은 인간 기반 서비스와 비교합니다. 이전 연구에서 ChatGPT가 다양한 NLP 작업에서 높은 성능을 보인 것으로 입증되었지만, HugginChat 및 FLAN과 같은 오픈소스 LLM은 비용 효율성, 투명성, 재현성, 우수한 데이터 보호 측면에서 주목받고 있습니다. 우리는 이러한 모델들을 제로샷 및 퓨샷 접근 방식과 다양한 온도 매개변수를 사용하여 다양한 텍스트 주석 작업에서 평가합니다. 연구 결과, ChatGPT가 대부분의 작업에서 최고의 성능을 달성하지만, 오픈소스 LLM은 MTurk를 능가할 뿐만 아니라 특정 작업에서 ChatGPT에 대해 경쟁력 있는 잠재력을 보여줍니다.

English

This study examines the performance of open-source Large Language Models (LLMs) in text annotation tasks and compares it with proprietary models like ChatGPT and human-based services such as MTurk. While prior research demonstrated the high performance of ChatGPT across numerous NLP tasks, open-source LLMs like HugginChat and FLAN are gaining attention for their cost-effectiveness, transparency, reproducibility, and superior data protection. We assess these models using both zero-shot and few-shot approaches and different temperature parameters across a range of text annotation tasks. Our findings show that while ChatGPT achieves the best performance in most tasks, open-source LLMs not only outperform MTurk but also demonstrate competitive potential against ChatGPT in specific tasks.

오픈소스 대형 언어 모델이 텍스트 주석 작업에서 크라우드 워커를 능가하고 ChatGPT에 근접한 성능을 보임

Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks

초록

Support