大型語言模型能理解上下文嗎?
Can Large Language Models Understand Context?
February 1, 2024
作者: Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng
cs.AI
摘要
理解上下文是理解人類語言的關鍵,這是大型語言模型(LLMs)日益展現出令人印象深刻的能力。然而,儘管LLMs的評估涵蓋自然語言處理領域內的各個範疇,卻對探究它們理解上下文特徵的語言能力給予有限的關注。本文通過調整現有數據集以適應生成模型的評估,引入了一個上下文理解基準。該基準包括四個不同任務和九個數據集,所有這些數據集都包含旨在評估模型理解上下文能力的提示。首先,我們在上下文學習預訓練情境下評估LLMs的性能。實驗結果表明,預先訓練的密集模型在理解更微妙的上下文特徵方面與最先進的微調模型相比存在困難。其次,隨著LLM壓縮在研究和實際應用中變得越來越重要,我們評估了在上下文學習設置下量化模型的上下文理解能力。我們發現,3位元後訓練量化導致在我們基準上性能降低程度不同。我們對這些情境進行了廣泛分析,以證實我們的實驗結果。
English
Understanding context is key to understanding human language, an ability
which Large Language Models (LLMs) have been increasingly seen to demonstrate
to an impressive extent. However, though the evaluation of LLMs encompasses
various domains within the realm of Natural Language Processing, limited
attention has been paid to probing their linguistic capability of understanding
contextual features. This paper introduces a context understanding benchmark by
adapting existing datasets to suit the evaluation of generative models. This
benchmark comprises of four distinct tasks and nine datasets, all featuring
prompts designed to assess the models' ability to understand context. First, we
evaluate the performance of LLMs under the in-context learning pretraining
scenario. Experimental results indicate that pre-trained dense models struggle
with understanding more nuanced contextual features when compared to
state-of-the-art fine-tuned models. Second, as LLM compression holds growing
significance in both research and real-world applications, we assess the
context understanding of quantized models under in-context-learning settings.
We find that 3-bit post-training quantization leads to varying degrees of
performance reduction on our benchmark. We conduct an extensive analysis of
these scenarios to substantiate our experimental results.