LOOM-Scope：一個全面且高效的長上下文模型評估框架

摘要

長上下文處理已成為大型語言模型（LLMs）的一項基本能力。為了評估模型的長上下文性能，眾多長上下文評估基準被提出。然而，這些基準在評估設置上的差異導致了結果的不一致，使得難以進行可靠的比較。此外，長上下文評估的高計算成本也為社群全面評估長上下文模型設置了重大障礙。本文中，我們提出了LOOM-Scope，一個全面且高效的長上下文評估框架。LOOM-Scope在多樣化的基準中標準化了評估設置，支持部署高效的長上下文推理加速方法，並引入了一個全面而輕量級的基準套件來綜合評估模型。主頁：https://loomscope.github.io

English

Long-context processing has become a fundamental capability for large language models~(LLMs). To assess model's long-context performance, numerous long-context evaluation benchmarks have been proposed. However, variations in evaluation settings across these benchmarks lead to inconsistent results, making it difficult to draw reliable comparisons. Besides, the high computational cost of long-context evaluation poses a significant barrier for the community to conduct comprehensive assessments of long-context models. In this paper, we propose LOOM-Scope, a comprehensive and efficient framework for long-context evaluation. LOOM-Scope standardizes evaluation settings across diverse benchmarks, supports deployment of efficient long-context inference acceleration methods, and introduces a holistic yet lightweight benchmark suite to evaluate models comprehensively. Homepage: https://loomscope.github.io

LOOM-Scope：一個全面且高效的長上下文模型評估框架

LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation framework

摘要

Support