BenTo: コンテキスト内転移を用いたベンチマークタスクの削減

要旨

大規模言語モデル（LLMs）の評価はコストがかかります：さまざまなタスクの大規模ベンチマークでのLLMの出力の生成と検証が必要です。本論文では、LLMsのベンチマークに使用されるタスクを効率的に削減する方法を調査し、評価品質に影響を与えないようにします。私たちの研究では、タスクの移転性と関連性が、施設配置関数を最適化することによって、最も代表的なタスクのサブセットを特定するための重要な情報を提供することが明らかになりました。私たちは、インコンテキスト学習（ICL）を通じて、2つのタスク間の移転性を推定するための実用的に効率的なメトリックを提案します。ペアワイズな移転性を分析することで、現代のLLMベンチマーク（例：MMLUまたはFLAN）のタスクを5％に削減し、元のベンチマークでの評価にわずか4％未満の差を生じることができます。従来の手法と比較して、私たちの方法はトレーニング不要であり、勾配不要であり、ICLのみを必要とする高効率です。

English

Evaluating large language models (LLMs) is costly: it requires the generation and examination of LLM outputs on a large-scale benchmark of various tasks. This paper investigates how to efficiently reduce the tasks used to benchmark LLMs without affecting the evaluation quality. Our study reveals that task transferability and relevance provide critical information to identify the most representative subset of tasks via optimizing a facility location function. We propose a practically efficient metric for estimating the transferability between two tasks via in-context learning (ICL). By analyzing the pairwise transferability, we can reduce tasks in a modern LLM benchmark (e.g., MMLU or FLAN) to 5% while inducing only a <4% difference to the evaluation on the original benchmark. Compared to prior works, our method is training-free, gradient-free, and highly efficient requiring ICL only.

BenTo: コンテキスト内転移を用いたベンチマークタスクの削減

BenTo: Benchmark Task Reduction with In-Context Transferability

要旨

Support