預測獎勵與標記：大型語言模型中高效推論干預的非干擾式參數插入

摘要

基於Transformer的大型語言模型（LLMs）存在諸如生成不安全回應、不可靠推理等限制。現有的推論干預方法試圖通過微調額外模型來生成校準信號（如獎勵），以引導LLM的解碼過程，以減輕這些問題。然而，這種解決方案由於需要獨立模型而引入了大量的時間和空間開銷。本研究提出了非干擾性參數插入（Otter），將額外參數插入Transformer架構，以預測校準信號和原始LLM輸出一起。Otter在多個具挑戰性任務上提供了最先進的性能，同時節省高達86.5％的額外空間和98.5％的額外時間。此外，Otter與現有的推論引擎無縫集成，僅需要一行代碼更改，並且在參數插入後，原始模型響應仍然可訪問。我們的代碼公開在https://github.com/chenhan97/Otter。

English

Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc. Existing inference intervention approaches attempt to mitigate these issues by finetuning additional models to produce calibration signals (such as rewards) that guide the LLM's decoding process. However, this solution introduces substantial time and space overhead due to the separate models required. This work proposes Non-disruptive parameters insertion (Otter), inserting extra parameters into the transformer architecture to predict calibration signals along with the original LLM output. Otter offers state-of-the-art performance on multiple demanding tasks while saving up to 86.5\% extra space and 98.5\% extra time. Furthermore, Otter seamlessly integrates with existing inference engines, requiring only a one-line code change, and the original model response remains accessible after the parameter insertion. Our code is publicly available at https://github.com/chenhan97/Otter

預測獎勵與標記：大型語言模型中高效推論干預的非干擾式參數插入

Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

摘要

Support