ChartCitor: 細かいチャートビジュアルの帰属を行うためのマルチエージェントフレームワーク

要旨

大規模言語モデル（LLMs）は、チャートの質問応答タスクを実行できますが、しばしば未検証の幻覚的な応答を生成します。既存の回答帰属方法は、視覚的意味コンテキストの制約、複雑な視覚テキストの整列要件、複雑なレイアウト全体にわたるバウンディングボックス予測の難しさにより、応答をソースチャートに根拠付けるのに苦労しています。本研究では、チャート画像内の支持証拠を特定することで、細かいバウンディングボックスの引用を提供するマルチエージェントフレームワークであるChartCitorを提案します。このシステムは、LLMエージェントを統括して、チャートからテーブルの抽出、回答の再構成、テーブルの拡張、事前フィルタリングおよび再ランキングを通じた証拠の取得、およびテーブルからチャートへのマッピングを実行します。ChartCitorは、さまざまなチャートタイプにわたって既存のベースラインを上回る性能を発揮します。質的ユーザースタディでは、ChartCitorが、LLM支援のチャートQAにおけるユーザーの信頼を高め、専門家がより生産的になるための強化された説明可能性を提供することが示されています。

English

Large Language Models (LLMs) can perform chart question-answering tasks but often generate unverified hallucinated responses. Existing answer attribution methods struggle to ground responses in source charts due to limited visual-semantic context, complex visual-text alignment requirements, and difficulties in bounding box prediction across complex layouts. We present ChartCitor, a multi-agent framework that provides fine-grained bounding box citations by identifying supporting evidence within chart images. The system orchestrates LLM agents to perform chart-to-table extraction, answer reformulation, table augmentation, evidence retrieval through pre-filtering and re-ranking, and table-to-chart mapping. ChartCitor outperforms existing baselines across different chart types. Qualitative user studies show that ChartCitor helps increase user trust in Generative AI by providing enhanced explainability for LLM-assisted chart QA and enables professionals to be more productive.

ChartCitor: 細かいチャートビジュアルの帰属を行うためのマルチエージェントフレームワーク

ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution

要旨

Support