超越上下文學習:基於任務固有屬性指南的大型語言模型長篇生成對齊
Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines
June 2, 2025
作者: Do Xuan Long, Duong Ngoc Yen, Do Xuan Trong, Luu Anh Tuan, Kenji Kawaguchi, Shafiq Joty, Min-Yen Kan, Nancy F. Chen
cs.AI
摘要
上下文學習(In-context Learning, ICL)是預訓練大型語言模型(LLMs)中一項重要但尚未完全理解的能力。它能夠在不進行微調的情況下,通過少量示例(稱為示範)顯著提升任務表現。儘管在問答任務中效果顯著,ICL在長篇生成任務如摘要撰寫中往往表現不佳。在適當的現實假設下,我們通過實證與理論分析表明,僅憑ICL示範不足以教會LLMs生成任務所需的語言與格式分佈。我們主張對任務分佈進行明確的暴露,並假設通過提示定義這些分佈能提升模型性能。為此,我們提出了LongGuide,它高效地生成兩條並行的指導流,捕捉任務語言與格式特性:(i)指標指導(Metric Guidelines, MGs),指導模型優化自我評估的指標;(ii)輸出約束指導(Output Constraint Guidelines, OCGs),在詞彙與句子層面約束生成。LongGuide自動選擇最佳的指導組合,在零樣本與少樣本設置下,將開源與閉源LLMs的性能提升超過5%。我們證明,LongGuide具有通用性,可由弱模型學習以增強強模型,並能與自動提示優化器協同整合。
English
In-context learning (ICL) is an important yet not fully understood ability of
pre-trained large language models (LLMs). It can greatly enhance task
performance using a few examples, termed demonstrations, without fine-tuning.
Although effective in question answering, ICL often underperforms in long-form
generation tasks such as summarization. Under appropriately realistic
assumptions, we empirically and theoretically show that ICL demonstrations
alone are insufficient to teach LLMs the task language and format distributions
for generation. We argue for explicit exposure to the task distributions and
hypothesize that defining them by prompting enhances model performance. To this
end, we present LongGuide, which efficiently generates two parallel streams of
guidelines capturing task language and format properties: (i) Metric Guidelines
(MGs) that instruct models to optimize self-evaluated metrics; and (ii) Output
Constraint Guidelines (OCGs) that constrain generation at both token and
sentence levels. LongGuide automatically selects the best combination of
guidelines, improving both strong open- and closed-source LLMs by over 5% in
both zero- and few-shot settings. We show that LongGuide is generalizable,
learnable by weak models to enhance strong ones, and integrates synergistically
with automatic prompt optimizers.