VLMEvalKit：用於評估大型多模態模型的開源工具包

摘要

我們介紹 VLMEvalKit：一個基於 PyTorch 的開源工具包，用於評估大型多模態模型。該工具旨在為研究人員和開發人員提供一個用戶友好且全面的框架，以評估現有的多模態模型並發布可重現的評估結果。在 VLMEvalKit 中，我們實現了超過 70 個不同的大型多模態模型，包括專有 API 和開源模型，以及超過 20 個不同的多模態基準。通過實現單一接口，新模型可以輕鬆添加到工具包中，同時工具包自動處理其餘工作負載，包括數據準備、分佈式推理、預測後處理和指標計算。儘管該工具包目前主要用於評估大型視覺-語言模型，但其設計與未來更新相容，可以整合其他模態，如音頻和視頻。根據使用該工具包獲得的評估結果，我們主持 OpenVLM Leaderboard，這是一個全面的排行榜，用於追蹤多模態學習研究的進展。該工具包發布在 https://github.com/open-compass/VLMEvalKit，並得到積極維護。

English

We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary APIs and open-source models, as well as more than 20 different multi-modal benchmarks. By implementing a single interface, new models can be easily added to the toolkit, while the toolkit automatically handles the remaining workloads, including data preparation, distributed inference, prediction post-processing, and metric calculation. Although the toolkit is currently mainly used for evaluating large vision-language models, its design is compatible with future updates that incorporate additional modalities, such as audio and video. Based on the evaluation results obtained with the toolkit, we host OpenVLM Leaderboard, a comprehensive leaderboard to track the progress of multi-modality learning research. The toolkit is released at https://github.com/open-compass/VLMEvalKit and is actively maintained.

VLMEvalKit：用於評估大型多模態模型的開源工具包

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

摘要

Support