通过语言化操作进行指令遵循评估

摘要

尽管经过指令调整的模型在各种自然语言处理任务中取得了显著成功，但准确评估其遵循指令的能力仍然具有挑战性。现有基准主要关注与模型训练期间学习内容相一致的常见指令。然而，对这些指令的响应熟练并不一定意味着在遵循指令方面具有强大能力。本文提出了一种名为“语言化器操作”的新型指令遵循评估协议。该协议指示模型用与模型先验知识程度不同程度对齐的单词来口头表达任务标签，采用从高度对齐（例如，对于积极情感输出“积极”）到最小对齐（例如，对于积极情感输出“消极”）的语言化器。语言化器操作可与任何分类基准轻松集成，以检验模型对先验知识的依赖程度及其覆盖它们以准确遵循指令的能力。我们对四个主要模型系列在九个数据集上进行了全面评估，为每个模型系列使用了十二组语言化器。我们观察到，模型在遵循指令方面的能力，跨不同系列和规模，主要通过其在较不自然语言化器上的表现而显著区分。即使最强大的GPT-4模型在最具挑战性的语言化器上也难以比随机猜测表现更好，强调了需要持续改进以提高它们的指令遵循能力。

English

While instruction-tuned models have shown remarkable success in various natural language processing tasks, accurately evaluating their ability to follow instructions remains challenging. Existing benchmarks primarily focus on common instructions that align well with what the model learned during training. However, proficiency in responding to these instructions does not necessarily imply strong ability in instruction following. In this paper, we propose a novel instruction-following evaluation protocol called verbalizer manipulation. It instructs the model to verbalize the task label with words aligning with model priors to different extents, adopting verbalizers from highly aligned (e.g., outputting ``postive'' for positive sentiment), to minimally aligned (e.g., outputting ``negative'' for positive sentiment). Verbalizer manipulation can be seamlessly integrated with any classification benchmark to examine the model's reliance on priors and its ability to override them to accurately follow the instructions. We conduct a comprehensive evaluation of four major model families across nine datasets, employing twelve sets of verbalizers for each of them. We observe that the instruction-following abilities of models, across different families and scales, are significantly distinguished by their performance on less natural verbalizers. Even the strongest GPT-4 model struggles to perform better than random guessing on the most challenging verbalizer, emphasizing the need for continued advancements to improve their instruction-following abilities.

通过语言化操作进行指令遵循评估

Instruction-following Evaluation through Verbalizer Manipulation

摘要

Support