大型語言模型生成的文本解釋能否提升模型分類性能?一項實證研究
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study
August 13, 2025
作者: Mahdi Dhaini, Juraj Vladika, Ege Erdogan, Zineb Attaoui, Gjergji Kasneci
cs.AI
摘要
在快速發展的可解釋自然語言處理(NLP)領域中,文本解釋(即類人的推理)對於闡明模型預測並為數據集增添可解釋的標籤至關重要。傳統方法依賴於人工註釋,這種方式成本高昂、耗費人力,且阻礙了可擴展性。在本研究中,我們提出了一個自動化框架,該框架利用多種最先進的大型語言模型(LLMs)來生成高質量的文本解釋。我們使用一套全面的自然語言生成(NLG)指標嚴格評估這些LLM生成解釋的質量。此外,我們還探討了這些解釋在兩個多樣化的基準數據集上,對預訓練語言模型(PLMs)和LLMs在自然語言推理任務中表現的下游影響。我們的實驗表明,在提升模型性能方面,自動生成的解釋與人工註釋的解釋相比展現出極具競爭力的效果。我們的研究結果凸顯了一條有前景的途徑,即基於LLM的可擴展、自動化文本解釋生成,用於擴展NLP數據集並增強模型性能。
English
In the rapidly evolving field of Explainable Natural Language Processing
(NLP), textual explanations, i.e., human-like rationales, are pivotal for
explaining model predictions and enriching datasets with interpretable labels.
Traditional approaches rely on human annotation, which is costly,
labor-intensive, and impedes scalability. In this work, we present an automated
framework that leverages multiple state-of-the-art large language models (LLMs)
to generate high-quality textual explanations. We rigorously assess the quality
of these LLM-generated explanations using a comprehensive suite of Natural
Language Generation (NLG) metrics. Furthermore, we investigate the downstream
impact of these explanations on the performance of pre-trained language models
(PLMs) and LLMs across natural language inference tasks on two diverse
benchmark datasets. Our experiments demonstrate that automated explanations
exhibit highly competitive effectiveness compared to human-annotated
explanations in improving model performance. Our findings underscore a
promising avenue for scalable, automated LLM-based textual explanation
generation for extending NLP datasets and enhancing model performance.