在保持準確性的同時增加多樣性：利用大型語言模型和人類干預進行文本數據生成

摘要

大型語言模型（LLMs）可用於生成文本數據，以訓練和評估其他模型。然而，使用LLMs創建高質量數據集可能具有挑戰性。在這項工作中，我們探索人工智能與人類的合作，以促進基於LLMs的文本數據生成具有高多樣性和準確性。我們首先研究了兩種增加文本生成多樣性的方法：1）logit抑制，減少已經頻繁生成的語言；2）溫度抽樣，使標記抽樣概率平坦化。我們發現多樣化方法可以增加數據的多樣性，但通常會以數據準確性為代價（即文本和標籤是否適合目標領域）。為了解決這個問題，我們研究了兩種人類干預方法：1）標籤替換（LR），糾正不對齊的標籤；2）範圍外篩選（OOSF），刪除不屬於用戶感興趣領域或無相應標籤的實例。通過Oracle研究，我們發現LR可以將使用多樣化數據集訓練的模型的絕對準確性提高14.4％。此外，我們發現使用LR干預生成的數據訓練的某些模型優於基於LLM的少樣本分類。相反，OOSF無法提高模型準確性，這表明需要未來在人機協作文本數據生成方面進行更多工作。

English

Large language models (LLMs) can be used to generate text data for training and evaluating other models. However, creating high-quality datasets with LLMs can be challenging. In this work, we explore human-AI partnerships to facilitate high diversity and accuracy in LLM-based text data generation. We first examine two approaches to diversify text generation: 1) logit suppression, which minimizes the generation of languages that have already been frequently generated, and 2) temperature sampling, which flattens the token sampling probability. We found that diversification approaches can increase data diversity but often at the cost of data accuracy (i.e., text and labels being appropriate for the target domain). To address this issue, we examined two human interventions, 1) label replacement (LR), correcting misaligned labels, and 2) out-of-scope filtering (OOSF), removing instances that are out of the user's domain of interest or to which no considered label applies. With oracle studies, we found that LR increases the absolute accuracy of models trained with diversified datasets by 14.4%. Moreover, we found that some models trained with data generated with LR interventions outperformed LLM-based few-shot classification. In contrast, OOSF was not effective in increasing model accuracy, implying the need for future work in human-in-the-loop text data generation.

在保持準確性的同時增加多樣性：利用大型語言模型和人類干預進行文本數據生成

Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions

摘要

Support