SPIQA:一個用於科學論文多模式問答的數據集
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
July 12, 2024
作者: Shraman Pramanick, Rama Chellappa, Subhashini Venugopalan
cs.AI
摘要
尋找長篇科學研究文章中問題的答案是一個重要的研究領域,有助於讀者快速解決疑問。然而,現有基於科學論文的問答(QA)數據集在規模上存在限制,並僅關注文本內容。為了解決這一限制,我們引入了 SPIQA(Scientific Paper Image Question Answering),這是第一個專門設計用於解釋計算機科學各個領域科學研究文章中複雜圖表的大規模 QA 數據集。利用多模式大型語言模型(MLLMs)的廣泛專業知識和理解圖表的能力,我們採用自動和手動編輯來創建數據集。我們設計了一個包含多個圖像的信息尋求任務,涵蓋各種繪圖、圖表、表格、示意圖和結果可視化。SPIQA 包含 27 萬個問題,分為訓練、驗證和三個不同的評估部分。通過與 12 個著名基礎模型的廣泛實驗,我們評估了當前多模式系統理解研究文章微妙方面的能力。此外,我們提出了一種具有上下文檢索的思維鏈(CoT)評估策略,允許進行細粒度、逐步評估並提高模型性能。我們進一步探索了通過額外文本信息提高性能的上限,突顯了其對未來研究的潛在潛力以及該數據集對改變我們與科學文獻互動方式的影響。
English
Seeking answers to questions within long scientific research articles is a
crucial area of study that aids readers in quickly addressing their inquiries.
However, existing question-answering (QA) datasets based on scientific papers
are limited in scale and focus solely on textual content. To address this
limitation, we introduce SPIQA (Scientific Paper Image Question Answering), the
first large-scale QA dataset specifically designed to interpret complex figures
and tables within the context of scientific research articles across various
domains of computer science. Leveraging the breadth of expertise and ability of
multimodal large language models (MLLMs) to understand figures, we employ
automatic and manual curation to create the dataset. We craft an
information-seeking task involving multiple images that cover a wide variety of
plots, charts, tables, schematic diagrams, and result visualizations. SPIQA
comprises 270K questions divided into training, validation, and three different
evaluation splits. Through extensive experiments with 12 prominent foundational
models, we evaluate the ability of current multimodal systems to comprehend the
nuanced aspects of research articles. Additionally, we propose a
Chain-of-Thought (CoT) evaluation strategy with in-context retrieval that
allows fine-grained, step-by-step assessment and improves model performance. We
further explore the upper bounds of performance enhancement with additional
textual information, highlighting its promising potential for future research
and the dataset's impact on revolutionizing how we interact with scientific
literature.Summary
AI-Generated Summary