ChartCitor:用於細粒度圖表視覺歸因的多智能體框架
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution
February 3, 2025
作者: Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt
cs.AI
摘要
大型語言模型(LLMs)可以執行圖表問答任務,但通常會生成未經驗證的幻覺式回應。現有的答案歸因方法在將回應與來源圖表關聯方面存在困難,原因包括視覺語義上下文有限、複雜的視覺文本對齊需求,以及在複雜佈局中進行邊界框預測的困難。我們提出了ChartCitor,這是一個多智能體框架,通過在圖表圖像中識別支持證據,提供精細的邊界框引用。該系統協調LLM智能體執行圖表到表格提取、答案重組、表格擴充、通過預篩選和重新排序進行證據檢索,以及表格到圖表的映射。ChartCitor在不同類型的圖表上優於現有基準。定性用戶研究表明,ChartCitor通過為LLM輔助的圖表問答提供增強的可解釋性,有助於提高用戶對生成式AI的信任,並使專業人士更具生產力。
English
Large Language Models (LLMs) can perform chart question-answering tasks but
often generate unverified hallucinated responses. Existing answer attribution
methods struggle to ground responses in source charts due to limited
visual-semantic context, complex visual-text alignment requirements, and
difficulties in bounding box prediction across complex layouts. We present
ChartCitor, a multi-agent framework that provides fine-grained bounding box
citations by identifying supporting evidence within chart images. The system
orchestrates LLM agents to perform chart-to-table extraction, answer
reformulation, table augmentation, evidence retrieval through pre-filtering and
re-ranking, and table-to-chart mapping. ChartCitor outperforms existing
baselines across different chart types. Qualitative user studies show that
ChartCitor helps increase user trust in Generative AI by providing enhanced
explainability for LLM-assisted chart QA and enables professionals to be more
productive.Summary
AI-Generated Summary