超越上下文学习：通过任务固有属性指导对齐大型语言模型的长文本生成

摘要

上下文学习（ICL）是预训练大型语言模型（LLMs）的一项重要但尚未完全理解的能力。它能够通过少量示例（称为演示）显著提升任务表现，而无需进行微调。尽管在问答任务中效果显著，ICL在诸如摘要生成长文本生成任务中往往表现欠佳。在适当的现实假设下，我们通过实证与理论分析表明，仅靠ICL演示不足以教会LLMs生成任务所需的语言和格式分布。我们主张明确接触任务分布，并假设通过提示定义这些分布能提升模型性能。为此，我们提出了LongGuide，它高效地生成两条并行指导流，分别捕捉任务语言和格式特性：（i）指标指导（MGs），指导模型优化自我评估的指标；（ii）输出约束指导（OCGs），在词元和句子层面约束生成。LongGuide自动选择最佳指导组合，在零样本和少样本设置下，将开源与闭源LLMs的性能提升超过5%。我们证明LongGuide具有通用性，可由弱模型学习以增强强模型，并能与自动提示优化器协同整合。

English

In-context learning (ICL) is an important yet not fully understood ability of pre-trained large language models (LLMs). It can greatly enhance task performance using a few examples, termed demonstrations, without fine-tuning. Although effective in question answering, ICL often underperforms in long-form generation tasks such as summarization. Under appropriately realistic assumptions, we empirically and theoretically show that ICL demonstrations alone are insufficient to teach LLMs the task language and format distributions for generation. We argue for explicit exposure to the task distributions and hypothesize that defining them by prompting enhances model performance. To this end, we present LongGuide, which efficiently generates two parallel streams of guidelines capturing task language and format properties: (i) Metric Guidelines (MGs) that instruct models to optimize self-evaluated metrics; and (ii) Output Constraint Guidelines (OCGs) that constrain generation at both token and sentence levels. LongGuide automatically selects the best combination of guidelines, improving both strong open- and closed-source LLMs by over 5% in both zero- and few-shot settings. We show that LongGuide is generalizable, learnable by weak models to enhance strong ones, and integrates synergistically with automatic prompt optimizers.

超越上下文学习：通过任务固有属性指导对齐大型语言模型的长文本生成

Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines

摘要

Support