BenTo：基于上下文可迁移性的基准任务简化

摘要

评估大型语言模型（LLMs）是昂贵的：它需要在大规模各种任务的基准测试中生成和检查LLM输出。本文研究如何在不影响评估质量的情况下，有效地减少用于基准测试LLMs的任务。我们的研究揭示了任务可转移性和相关性提供了关键信息，通过优化设施选址函数来识别最具代表性的任务子集。我们提出了一种实际高效的度量标准，用于通过上下文学习（ICL）估计两个任务之间的可转移性。通过分析成对的可转移性，我们可以将现代LLM基准测试（例如MMLU或FLAN）中的任务减少到5％，同时仅对原始基准测试的评估产生不到4％的差异。与先前的工作相比，我们的方法无需训练，无需梯度，仅需要ICL，而且高效。

English

Evaluating large language models (LLMs) is costly: it requires the generation and examination of LLM outputs on a large-scale benchmark of various tasks. This paper investigates how to efficiently reduce the tasks used to benchmark LLMs without affecting the evaluation quality. Our study reveals that task transferability and relevance provide critical information to identify the most representative subset of tasks via optimizing a facility location function. We propose a practically efficient metric for estimating the transferability between two tasks via in-context learning (ICL). By analyzing the pairwise transferability, we can reduce tasks in a modern LLM benchmark (e.g., MMLU or FLAN) to 5% while inducing only a <4% difference to the evaluation on the original benchmark. Compared to prior works, our method is training-free, gradient-free, and highly efficient requiring ICL only.

BenTo：基于上下文可迁移性的基准任务简化

BenTo: Benchmark Task Reduction with In-Context Transferability

摘要

Support