自我指導的過程：引入早期停止標準以達到最小指導調整。

摘要

本文介紹了指令遵循分數（Instruction Following Score, IFS），一個用於檢測語言模型遵循指令能力的指標。該指標具有雙重目的。首先，IFS可用於區分基礎模型和指導模型。我們對公開可用的基礎模型和指導模型進行基準測試，並顯示格式良好的回應與部分和完整句子的比率可以是區分這兩個模型類別的有效衡量標準。其次，該指標可用作指導微調的提前停止標準。我們計算了7B和13B LLaMA模型的監督微調（Supervised Fine-Tuning, SFT）的IFS，顯示模型在訓練過程中相對早期學會遵循指令，進一步的微調可能導致基礎模型語義的變化。作為語義變化的一個例子，我們展示了模型預測的客觀性，這是由輔助指標ObjecQA定義的。我們發現，在這種特定情況下，當IFS趨於平穩時，語義變化最為劇烈。我們希望將指導微調分解為IFS和語義因素能夠開啟更好可控的指導微調新趨勢，並為設計查詢基礎模型的最小指導界面開啟可能性。

English

In this paper, we introduce the Instruction Following Score (IFS), a metric that detects language models' ability to follow instructions. The metric has a dual purpose. First, IFS can be used to distinguish between base and instruct models. We benchmark publicly available base and instruct models, and show that the ratio of well formatted responses to partial and full sentences can be an effective measure between those two model classes. Secondly, the metric can be used as an early stopping criteria for instruct tuning. We compute IFS for Supervised Fine-Tuning (SFT) of 7B and 13B LLaMA models, showing that models learn to follow instructions relatively early in the training process, and the further finetuning can result in changes in the underlying base model semantics. As an example of semantics change we show the objectivity of model predictions, as defined by an auxiliary metric ObjecQA. We show that in this particular case, semantic changes are the steepest when the IFS tends to plateau. We hope that decomposing instruct tuning into IFS and semantic factors starts a new trend in better controllable instruct tuning and opens possibilities for designing minimal instruct interfaces querying foundation models.

自我指導的過程：引入早期停止標準以達到最小指導調整。

Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning

摘要

Support