何謂優質自然語言提示?
What Makes a Good Natural Language Prompt?
June 7, 2025
作者: Do Xuan Long, Duy Dinh, Ngoc-Hai Nguyen, Kenji Kawaguchi, Nancy F. Chen, Shafiq Joty, Min-Yen Kan
cs.AI
摘要
隨著大型語言模型(LLMs)向更接近人類的溝通方式發展,人機互動變得日益普遍,提示(prompting)作為關鍵要素的重要性也隨之凸顯。然而,關於如何量化自然語言提示,學界尚未達成明確的共識。為此,我們通過對2022年至2025年間頂尖自然語言處理(NLP)與人工智慧(AI)會議及相關博客中超過150篇提示相關文獻進行元分析,嘗試解答這一問題。我們提出了一個以屬性與人為核心的框架,用於評估提示質量,該框架涵蓋了六個維度下的21項屬性。隨後,我們檢視了現有研究如何評估這些屬性對LLMs的影響,揭示了它們在不同模型與任務間支持的不均衡性及顯著的研究空白。進一步地,我們分析了高質量自然語言提示中各屬性間的相關性,從而推導出提示設計的建議。在推理任務中,我們實證探索了多屬性提示的增強效果,發現單一屬性的增強往往能帶來最大的影響。最後,我們發現,在基於屬性增強提示的指令微調下,能夠訓練出更優的推理模型。本研究為以屬性為中心的提示評估與優化奠定了基礎,彌合了人機溝通間的鴻溝,並開闢了提示研究的新方向。
English
As large language models (LLMs) have progressed towards more human-like and
human--AI communications have become prevalent, prompting has emerged as a
decisive component. However, there is limited conceptual consensus on what
exactly quantifies natural language prompts. We attempt to address this
question by conducting a meta-analysis surveying more than 150
prompting-related papers from leading NLP and AI conferences from 2022 to 2025
and blogs. We propose a property- and human-centric framework for evaluating
prompt quality, encompassing 21 properties categorized into six dimensions. We
then examine how existing studies assess their impact on LLMs, revealing their
imbalanced support across models and tasks, and substantial research gaps.
Further, we analyze correlations among properties in high-quality natural
language prompts, deriving prompting recommendations. We then empirically
explore multi-property prompt enhancements in reasoning tasks, observing that
single-property enhancements often have the greatest impact. Finally, we
discover that instruction-tuning on property-enhanced prompts can result in
better reasoning models. Our findings establish a foundation for
property-centric prompt evaluation and optimization, bridging the gaps between
human--AI communication and opening new prompting research directions.