ChatPaper.aiChatPaper

視覺表格問答:開放領域表格圖像推理基準

Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

September 9, 2025
作者: Boammani Aser Lompo, Marc Haraoui
cs.AI

摘要

針對結構化數據(如表格)的視覺推理能力,對於現代視覺語言模型(VLMs)而言至關重要。然而,現有的基準測試在規模、多樣性或推理深度方面仍顯不足,尤其是在處理渲染後的表格圖像時。為填補這一空白,我們推出了Visual-TableQA,這是一個大規模、開放領域的多模態數據集,專門設計用於評估和提升對複雜表格數據的視覺推理能力。我們的生成流程模塊化、可擴展且完全自動化,涉及多個推理大型語言模型(LLMs)在不同角色間協作:生成、驗證和啟發。Visual-TableQA包含2.5千個結構豐富的LaTeX渲染表格和6千個推理密集的問答對,所有這些的製作成本低於100美元。為了促進多樣性和創造性,我們的流程通過跨模型提示(“啟發”)和LLM評審過濾實現多模型協作數據生成。更強的模型負責佈局和主題的初步構建,而較弱的模型則進行細節擴展,共同將多樣的推理模式和視覺結構提煉到數據集中。實驗結果表明,在Visual-TableQA上微調的模型能夠穩健地泛化到外部基準測試,儘管數據集是合成的,但仍超越多個專有模型。完整的流程和資源已公開於https://github.com/AI-4-Everyone/Visual-TableQA。
English
Visual reasoning over structured data such as tables is a critical capability for modern vision-language models (VLMs), yet current benchmarks remain limited in scale, diversity, or reasoning depth, especially when it comes to rendered table images. Addressing this gap, we introduce Visual-TableQA, a large-scale, open-domain multimodal dataset specifically designed to evaluate and enhance visual reasoning over complex tabular data. Our generation pipeline is modular, scalable, and fully autonomous, involving multiple reasoning LLMs collaborating across distinct roles: generation, validation, and inspiration. Visual-TableQA comprises 2.5k richly structured LaTeX-rendered tables and 6k reasoning-intensive QA pairs, all produced at a cost of under USD 100. To promote diversity and creativity, our pipeline performs multi-model collaborative data generation via cross-model prompting ('inspiration') and LLM-jury filtering. Stronger models seed layouts and topics that weaker models elaborate, collectively distilling diverse reasoning patterns and visual structures into the dataset. Empirical results show that models fine-tuned on Visual-TableQA generalize robustly to external benchmarks, outperforming several proprietary models despite the dataset's synthetic nature. The full pipeline and resources are publicly available at https://github.com/AI-4-Everyone/Visual-TableQA.
PDF42January 19, 2026