ChatPaper.aiChatPaper

FIRE:用於多模型模型反饋整合和改進評估的數據集

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

July 16, 2024
作者: Pengxiang Li, Zhi Gao, Bofei Zhang, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li
cs.AI

摘要

視覺語言模型(VLMs)在各種應用中取得了令人印象深刻的進展,成為一個普遍的研究方向。本文中,我們建立了一個名為FIRE的反饋-精煉數據集,包含了110萬個來自27個來源數據集的多輪對話,使VLMs能夠根據用戶反饋跨不同任務自動精煉其回應。為了擴大數據收集,FIRE分為兩個部分收集:FIRE-100K和FIRE-1M,其中FIRE-100K由GPT-4V生成,而FIRE-1M則通過在FIRE-100K上訓練的模型自由生成。然後,我們建立了一個名為FIRE-Bench的基準,用於全面評估VLMs的反饋精煉能力,其中包含了11K個反饋精煉對話作為測試數據,兩種評估設置,以及一個為VLMs提供反饋的模型。我們通過在FIRE-100K和FIRE-1M上微調LLaVA來開發FIRE-LLaVA模型,該模型在FIRE-Bench上展現出卓越的反饋精煉能力,並且比未經訓練的VLMs表現提高了50%,使用戶-代理互動更加高效,突顯了FIRE數據集的重要性。
English
Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction. In this paper, we build FIRE, a feedback-refinement dataset, consisting of 1.1M multi-turn conversations that are derived from 27 source datasets, empowering VLMs to spontaneously refine their responses based on user feedback across diverse tasks. To scale up the data collection, FIRE is collected in two components: FIRE-100K and FIRE-1M, where FIRE-100K is generated by GPT-4V, and FIRE-1M is freely generated via models trained on FIRE-100K. Then, we build FIRE-Bench, a benchmark to comprehensively evaluate the feedback-refining capability of VLMs, which contains 11K feedback-refinement conversations as the test data, two evaluation settings, and a model to provide feedback for VLMs. We develop the FIRE-LLaVA model by fine-tuning LLaVA on FIRE-100K and FIRE-1M, which shows remarkable feedback-refining capability on FIRE-Bench and outperforms untrained VLMs by 50%, making more efficient user-agent interactions and underscoring the significance of the FIRE dataset.

Summary

AI-Generated Summary

PDF92November 28, 2024