设计一个提示工程师

摘要

即时工程是优化大型语言模型（LLMs）性能的一项具有挑战性但至关重要的任务。它需要复杂的推理来检查模型的错误，假设当前提示中缺少或误导的内容，并清晰地传达任务。尽管最近的研究表明LLMs可以被元提示以执行自动提示工程，但由于缺乏足够的指导来引发LLMs在元提示中进行复杂推理能力，它们的潜力可能尚未完全发挥。在这项工作中，我们研究了“提示工程提示工程师”的问题 - 构建一个更有效引导LLMs执行自动提示工程的元提示。我们介绍并分析了关键组件，如逐步推理模板和上下文规范，这些组件可以提高性能。此外，受批量大小、步长和动量等常见优化概念的启发，我们引入它们的口头化对应项到元提示中，并研究它们的影响。我们的最终方法，命名为PE2，在MultiArith数据集上比“让我们逐步思考”高出6.3%，在GSM8K数据集上高出3.1%。为了展示其多功能性，我们将PE2应用于Instruction Induction基准测试、一系列反事实任务以及一个冗长的真实工业提示。在这些设置中，PE2取得了良好的性能，并优于先前的自动提示工程基线。此外，我们展示PE2进行了有意义且有针对性的提示编辑，修正了错误或不完整的提示，并展示了非平凡的反事实推理能力。

English

Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models (LLMs). It requires complex reasoning to examine the model's errors, hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that LLMs can be meta-prompted to perform automatic prompt engineering, their potentials may not be fully untapped due to the lack of sufficient guidance to elicit complex reasoning capabilities in LLMs in the meta-prompt. In this work, we investigate the problem of "prompt engineering a prompt engineer" -- constructing a meta-prompt that more effectively guides LLMs to perform automatic prompt engineering. We introduce and analyze key components, such as a step-by-step reasoning template and context specification, which lead to improved performance. In addition, inspired by common optimization concepts such as batch size, step size and momentum, we introduce their verbalized counterparts to the meta-prompt and investigate their effects. Our final method, named PE2, finds a prompt that outperforms "let's think step by step" by 6.3% on the MultiArith dataset and 3.1% on the GSM8K dataset. To demonstrate its versatility, we apply PE2 to the Instruction Induction benchmark, a suite of counterfactual tasks, and a lengthy, real-world industrial prompt. In these settings, PE2 achieves strong performance and outperforms prior automatic prompt engineering baselines. Further, we show that PE2 makes meaningful and targeted prompt edits, amends erroneous or incomplete prompts, and presents non-trivial counterfactual reasoning abilities.

设计一个提示工程师

Prompt Engineering a Prompt Engineer

摘要

Support