LOOM-Scope:一個全面且高效的長上下文模型評估框架
LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation framework
July 7, 2025
作者: Zecheng Tang, Haitian Wang, Quantong Qiu, Baibei Ji, Ruoxi Sun, Keyan Zhou, Juntao Li, Min Zhang
cs.AI
摘要
長上下文處理已成為大型語言模型(LLMs)的一項基本能力。為了評估模型的長上下文性能,眾多長上下文評估基準被提出。然而,這些基準在評估設置上的差異導致了結果的不一致,使得難以進行可靠的比較。此外,長上下文評估的高計算成本也為社群全面評估長上下文模型設置了重大障礙。本文中,我們提出了LOOM-Scope,一個全面且高效的長上下文評估框架。LOOM-Scope在多樣化的基準中標準化了評估設置,支持部署高效的長上下文推理加速方法,並引入了一個全面而輕量級的基準套件來綜合評估模型。主頁:https://loomscope.github.io
English
Long-context processing has become a fundamental capability for large
language models~(LLMs). To assess model's long-context performance, numerous
long-context evaluation benchmarks have been proposed. However, variations in
evaluation settings across these benchmarks lead to inconsistent results,
making it difficult to draw reliable comparisons. Besides, the high
computational cost of long-context evaluation poses a significant barrier for
the community to conduct comprehensive assessments of long-context models. In
this paper, we propose LOOM-Scope, a comprehensive and efficient framework for
long-context evaluation. LOOM-Scope standardizes evaluation settings across
diverse benchmarks, supports deployment of efficient long-context inference
acceleration methods, and introduces a holistic yet lightweight benchmark suite
to evaluate models comprehensively. Homepage: https://loomscope.github.io