MLLMs 基準的冗餘原則
Redundancy Principles for MLLMs Benchmarks
January 20, 2025
作者: Zicheng Zhang, Xiangyu Zhao, Xinyu Fang, Chunyi Li, Xiaohong Liu, Xiongkuo Min, Haodong Duan, Kai Chen, Guangtao Zhai
cs.AI
摘要
隨著多模態大型語言模型(MLLMs)的快速迭代和領域需求的不斷演變,每年產生的基準數量激增至數百個。快速增長不可避免地導致基準之間存在顯著的冗餘。因此,關鍵是要退一步,對當前的冗餘狀況進行批判性評估,並提出建構有效MLLM基準的有針對性原則。本文聚焦於三個關鍵角度的冗餘:1)基準能力維度的冗餘,2)測試問題數量的冗餘,以及3)特定領域內跨基準的冗餘。通過對數百個MLLM在20多個基準上表現的全面分析,我們旨在定量衡量現有MLLM評估中存在的冗餘程度,提供有價值的見解以指導未來MLLM基準的發展,並提供改進和有效應對冗餘問題的策略。
English
With the rapid iteration of Multi-modality Large Language Models (MLLMs) and
the evolving demands of the field, the number of benchmarks produced annually
has surged into the hundreds. The rapid growth has inevitably led to
significant redundancy among benchmarks. Therefore, it is crucial to take a
step back and critically assess the current state of redundancy and propose
targeted principles for constructing effective MLLM benchmarks. In this paper,
we focus on redundancy from three key perspectives: 1) Redundancy of benchmark
capability dimensions, 2) Redundancy in the number of test questions, and 3)
Cross-benchmark redundancy within specific domains. Through the comprehensive
analysis over hundreds of MLLMs' performance across more than 20 benchmarks, we
aim to quantitatively measure the level of redundancy lies in existing MLLM
evaluations, provide valuable insights to guide the future development of MLLM
benchmarks, and offer strategies to refine and address redundancy issues
effectively.Summary
AI-Generated Summary