大型語言模型能理解上下文嗎？

摘要

理解上下文是理解人類語言的關鍵，這是大型語言模型（LLMs）日益展現出令人印象深刻的能力。然而，儘管LLMs的評估涵蓋自然語言處理領域內的各個範疇，卻對探究它們理解上下文特徵的語言能力給予有限的關注。本文通過調整現有數據集以適應生成模型的評估，引入了一個上下文理解基準。該基準包括四個不同任務和九個數據集，所有這些數據集都包含旨在評估模型理解上下文能力的提示。首先，我們在上下文學習預訓練情境下評估LLMs的性能。實驗結果表明，預先訓練的密集模型在理解更微妙的上下文特徵方面與最先進的微調模型相比存在困難。其次，隨著LLM壓縮在研究和實際應用中變得越來越重要，我們評估了在上下文學習設置下量化模型的上下文理解能力。我們發現，3位元後訓練量化導致在我們基準上性能降低程度不同。我們對這些情境進行了廣泛分析，以證實我們的實驗結果。

English

Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various domains within the realm of Natural Language Processing, limited attention has been paid to probing their linguistic capability of understanding contextual features. This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models. This benchmark comprises of four distinct tasks and nine datasets, all featuring prompts designed to assess the models' ability to understand context. First, we evaluate the performance of LLMs under the in-context learning pretraining scenario. Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models. Second, as LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings. We find that 3-bit post-training quantization leads to varying degrees of performance reduction on our benchmark. We conduct an extensive analysis of these scenarios to substantiate our experimental results.

大型語言模型能理解上下文嗎？

Can Large Language Models Understand Context?

摘要

Support