自我指导：引入早停止准则以实现最小指导调整

摘要

本文介绍了指令遵循得分（IFS），这是一种检测语言模型遵循指令能力的度量标准。该度量标准具有双重目的。首先，IFS可用于区分基础模型和指令模型。我们对公开可用的基础模型和指令模型进行基准测试，并表明格式良好的响应与部分和完整句子的比率可以成为区分这两种模型类别的有效衡量标准。其次，该度量标准可用作指令调整的早停准则。我们计算了7B和13B LLaMA模型的监督微调（SFT）的IFS，表明模型在训练过程中相对早期学会遵循指令，并进一步的微调可能导致基础模型语义的变化。作为语义变化的示例，我们展示了模型预测的客观性，该客观性由辅助度量标准ObjecQA定义。我们表明，在这种特定情况下，当IFS趋于稳定时，语义变化最为显著。我们希望将指令调整分解为IFS和语义因素，开启更好可控的指令调整新趋势，并为设计查询基础模型的最小指令接口开辟可能性。

English

In this paper, we introduce the Instruction Following Score (IFS), a metric that detects language models' ability to follow instructions. The metric has a dual purpose. First, IFS can be used to distinguish between base and instruct models. We benchmark publicly available base and instruct models, and show that the ratio of well formatted responses to partial and full sentences can be an effective measure between those two model classes. Secondly, the metric can be used as an early stopping criteria for instruct tuning. We compute IFS for Supervised Fine-Tuning (SFT) of 7B and 13B LLaMA models, showing that models learn to follow instructions relatively early in the training process, and the further finetuning can result in changes in the underlying base model semantics. As an example of semantics change we show the objectivity of model predictions, as defined by an auxiliary metric ObjecQA. We show that in this particular case, semantic changes are the steepest when the IFS tends to plateau. We hope that decomposing instruct tuning into IFS and semantic factors starts a new trend in better controllable instruct tuning and opens possibilities for designing minimal instruct interfaces querying foundation models.

自我指导：引入早停止准则以实现最小指导调整

Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning

摘要

Support