ReFT：語言模型的表示微調

摘要

參數高效微調（PEFT）方法旨在通過對少量權重進行更新來調整大型模型。然而，許多先前的可解釋性研究顯示，表示編碼了豐富的語義信息，暗示編輯表示可能是一種更強大的替代方案。在這裡，我們通過開發一系列表示微調（ReFT）方法來追求這一假設。ReFT方法在凍結的基本模型上運行，並學習對隱藏表示進行特定任務的干預。我們定義了ReFT家族的一個強實例，即低秩線性子空間ReFT（LoReFT）。LoReFT是現有PEFT的即插即用替代方案，學習的干預比先前最先進的PEFT高效10倍至50倍。我們展示了LoReFT在八個常識推理任務、四個算術推理任務、Alpaca-Eval v1.0和GLUE上的應用。在所有這些評估中，LoReFT提供了效率和性能的最佳平衡，幾乎總是優於最先進的PEFT。我們在https://github.com/stanfordnlp/pyreft 公開發布了一個通用的ReFT訓練庫。

English

Parameter-efficient fine-tuning (PEFT) methods seek to adapt large models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. Here, we pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT). LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs. We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, Alpaca-Eval v1.0, and GLUE. In all these evaluations, LoReFT delivers the best balance of efficiency and performance, and almost always outperforms state-of-the-art PEFTs. We release a generic ReFT training library publicly at https://github.com/stanfordnlp/pyreft.

ReFT：語言模型的表示微調

ReFT: Representation Finetuning for Language Models

摘要

Support