トークンに沿った報酬予測：大規模言語モデルにおける効率的な推論介入のための非破壊的パラメータ挿入

要旨

Transformerベースの大規模言語モデル（LLM）は、安全でない応答の生成や信頼性の低い推論などの制限を示します。既存の推論介入アプローチでは、追加のモデルをファインチューニングして、LLMのデコードプロセスを導くキャリブレーション信号（報酬など）を生成することで、これらの問題を緩和しようと試みています。しかし、この解決策では、別個のモデルが必要となるため、大幅な時間とスペースのオーバーヘッドが生じます。本研究では、Transformerアーキテクチャに追加のパラメータを挿入して、元のLLM出力とともにキャリブレーション信号を予測する「非破壊的パラメータ挿入（Otter）」を提案します。Otterは、複数の要求の厳しいタスクにおいて最先端の性能を提供し、最大86.5％の追加スペースと98.5％の追加時間を節約します。さらに、Otterは既存の推論エンジンとシームレスに統合され、わずか1行のコード変更のみで済み、パラメータ挿入後も元のモデル応答にアクセス可能です。私たちのコードはhttps://github.com/chenhan97/Otterで公開されています。

English

Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc. Existing inference intervention approaches attempt to mitigate these issues by finetuning additional models to produce calibration signals (such as rewards) that guide the LLM's decoding process. However, this solution introduces substantial time and space overhead due to the separate models required. This work proposes Non-disruptive parameters insertion (Otter), inserting extra parameters into the transformer architecture to predict calibration signals along with the original LLM output. Otter offers state-of-the-art performance on multiple demanding tasks while saving up to 86.5\% extra space and 98.5\% extra time. Furthermore, Otter seamlessly integrates with existing inference engines, requiring only a one-line code change, and the original model response remains accessible after the parameter insertion. Our code is publicly available at https://github.com/chenhan97/Otter

トークンに沿った報酬予測：大規模言語モデルにおける効率的な推論介入のための非破壊的パラメータ挿入

Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

要旨

Support