ChatPaper.aiChatPaper

ProSA:评估和理解LLM的提示敏感性

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

October 16, 2024
作者: Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, Kai Chen
cs.AI

摘要

大型语言模型(LLMs)在各种任务中展示出令人印象深刻的能力,但它们的性能对使用的提示非常敏感。这种变化性给准确评估和用户满意度带来挑战。当前研究经常忽视实例级别的提示变化及其对主观评估的影响。为了解决这些缺点,我们引入了ProSA,这是一个旨在评估和理解LLMs中提示敏感性的框架。ProSA结合了一种新颖的敏感度度量标准PromptSensiScore,并利用解码置信度来阐明潜在机制。我们的广泛研究跨越多个任务,揭示了提示敏感性在数据集和模型之间波动,较大模型表现出增强的稳健性。我们观察到少样本示例可以缓解这种敏感性问题,主观评估也容易受到提示敏感性的影响,特别是在复杂的、以推理为导向的任务中。此外,我们的发现表明,更高的模型置信度与增强的提示稳健性相关。我们相信这项工作将成为研究LLMs提示敏感性的有用工具。该项目已发布在:https://github.com/open-compass/ProSA。
English
Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but their performance is highly sensitive to the prompts utilized. This variability poses challenges for accurate assessment and user satisfaction. Current research frequently overlooks instance-level prompt variations and their implications on subjective evaluations. To address these shortcomings, we introduce ProSA, a framework designed to evaluate and comprehend prompt sensitivity in LLMs. ProSA incorporates a novel sensitivity metric, PromptSensiScore, and leverages decoding confidence to elucidate underlying mechanisms. Our extensive study, spanning multiple tasks, uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness. We observe that few-shot examples can alleviate this sensitivity issue, and subjective evaluations are also susceptible to prompt sensitivities, particularly in complex, reasoning-oriented tasks. Furthermore, our findings indicate that higher model confidence correlates with increased prompt robustness. We believe this work will serve as a helpful tool in studying prompt sensitivity of LLMs. The project is released at: https://github.com/open-compass/ProSA .

Summary

AI-Generated Summary

PDF132November 16, 2024