推論時技術による微調整済みTransformerの能力の引き出し

要旨

大規模言語モデルは自然言語処理を変革してきたが、教師ありファインチューニング（SFT）は依然として計算コストが高い。本論文では、理想的な仮定（無限の計算資源とファインチューニングデータセットへのアクセスを含む）の下で、SFTを通じて獲得された能力が、モデルパラメータを変更することなく、推論時技術、特にインコンテキスト学習（ICL）を用いてベーストランスフォーマーモデルによって近似可能であることを正式に証明する。これらの結果を、有限のコンテキスト長と部分的なデータセットアクセスという実用的なシナリオに拡張する。固定出力長lのテキスト生成タスクにおいて、サイズOleft( m V{varepsilon^2} log m{delta} right)または、有界コンテキストの場合Oleft( l log V{varepsilon^2} log 1{delta} right)のデータセットで、m個のコンテキストにわたって誤差varepsilon以内でファインチューニングされた振る舞いを近似できる。ここで、Vは語彙サイズ、deltaは失敗確率である。線形分類では、サイズOleft( d{varepsilon} right)または、固定コンテキストの場合Oleft( 1{varepsilon^2} log 1{delta} right)のデータセットで十分であり、dは入力次元である。トランスフォーマーのチューリング完全性に基づくこれらの結果は、大規模言語モデルのリソース効率的な展開の理論的基盤を提供し、検索拡張生成などの実用的な技術が理論を現実世界のアプリケーションに橋渡しする。

English

Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive. This paper formally proves that capabilities acquired through SFT can be approximated by a base transformer model using inference-time techniques, specifically in-context learning (ICL), without altering model parameters, under idealized assumptions including unbounded computational resources and access to the fine-tuning dataset. We extend these results to practical scenarios with finite context lengths and partial dataset access. For text generation tasks with fixed output length l, datasets of size Oleft( m V{varepsilon^2} log m{delta} right) or, with bounded context, Oleft( l log V{varepsilon^2} log 1{delta} right) suffice to approximate fine-tuned behavior across m contexts within error varepsilon, where V is the vocabulary size and delta is the failure probability. For linear classification, datasets of size Oleft( d{varepsilon} right) or, with fixed context, Oleft( 1{varepsilon^2} log 1{delta} right) are sufficient, where d is the input dimension. Grounded in the Turing completeness of transformers, these results provide a theoretical foundation for resource-efficient deployment of large language models, with practical techniques like retrieval-augmented generation bridging theory to real-world applications.

推論時技術による微調整済みTransformerの能力の引き出し

Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques

要旨

Support