迷失于提示序列：揭示语言模型中因果注意力的局限性

摘要

大型语言模型对提示结构展现出惊人的敏感性，但其内在机制尚未得到充分阐释。本研究聚焦一个典型现象：在多项选择题作答任务中，将语境置于问题和选项之前（CQO）的提示方式，相较逆向排序（QOC）可获得超过14个百分点的性能提升，且该现象在不同模型与数据集间具有普适性。通过系统性架构分析，我们发现因果注意力机制是核心成因：在QOC提示中，因果掩码会阻止选项词元关注语境信息，形成语境对选项不可见的信息瓶颈。

English

Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) by over 14%p, consistently over a wide range of models and datasets. Through systematic architectural analysis, we identify causal attention as the core mechanism: in QOC prompts, the causal mask prevents option tokens from attending to context, creating an information bottleneck where context becomes invisible to options.

迷失于提示序列：揭示语言模型中因果注意力的局限性

Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

摘要

Support