优质自然语言提示的关键要素是什么？

摘要

随着大型语言模型（LLMs）向更类人的方向发展，人机交互日益普遍，提示词（prompting）已成为关键要素。然而，对于如何量化自然语言提示词，学界尚未达成明确共识。为此，我们通过对2022至2025年间顶级NLP与AI会议论文及博客中超过150篇相关文献进行元分析，尝试解答这一问题。我们提出了一个以属性与人为中心的框架，用于评估提示词质量，该框架涵盖六大维度下的21项属性。随后，我们考察了现有研究如何评估这些属性对LLMs的影响，揭示了它们在模型与任务间支持的不均衡性及显著的研究空白。进一步，我们分析了高质量自然语言提示词中属性间的关联，得出了提示词设计的建议。在推理任务中，我们实证探索了多属性提示词增强的效果，发现单一属性增强往往影响最大。最后，我们发现基于属性增强提示词进行指令微调，能够训练出更优的推理模型。本研究为以属性为中心的提示词评估与优化奠定了基础，弥合了人机交互的鸿沟，并开辟了提示词研究的新方向。

English

As large language models (LLMs) have progressed towards more human-like and human--AI communications have become prevalent, prompting has emerged as a decisive component. However, there is limited conceptual consensus on what exactly quantifies natural language prompts. We attempt to address this question by conducting a meta-analysis surveying more than 150 prompting-related papers from leading NLP and AI conferences from 2022 to 2025 and blogs. We propose a property- and human-centric framework for evaluating prompt quality, encompassing 21 properties categorized into six dimensions. We then examine how existing studies assess their impact on LLMs, revealing their imbalanced support across models and tasks, and substantial research gaps. Further, we analyze correlations among properties in high-quality natural language prompts, deriving prompting recommendations. We then empirically explore multi-property prompt enhancements in reasoning tasks, observing that single-property enhancements often have the greatest impact. Finally, we discover that instruction-tuning on property-enhanced prompts can result in better reasoning models. Our findings establish a foundation for property-centric prompt evaluation and optimization, bridging the gaps between human--AI communication and opening new prompting research directions.

优质自然语言提示的关键要素是什么？

What Makes a Good Natural Language Prompt?

摘要

Support