循流溯源:基于神经符号代理的细粒度流程图归因
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
June 2, 2025
作者: Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Vivek Gupta, Dinesh Manocha
cs.AI
摘要
流程图是可视化决策过程的关键工具。然而,其非线性结构及复杂的视觉-文本关系使得利用大语言模型(LLMs)进行解读颇具挑战,视觉-语言模型在分析此类图表时常常会虚构不存在的连接与决策路径。这导致在物流、医疗和工程等关键领域中,自动化流程图处理的可靠性大打折扣。我们引入了细粒度流程图归因任务,旨在追踪支撑LLM对流程图回应的具体组件。通过流程图归因,确保了LLM预测的可验证性,并通过将生成响应与流程图结构相链接,提升了可解释性。我们提出了FlowPathAgent,一种神经符号代理,它通过基于图的推理执行细粒度的事后归因。该代理首先分割流程图,将其转化为结构化的符号图,随后采用代理方法动态与图交互,以生成归因路径。此外,我们推出了FlowExplainBench,一个新颖的基准测试,用于评估跨多种风格、领域和问题类型的流程图归因。实验结果表明,FlowPathAgent在流程图问答任务中有效减少了LLM回答中的视觉幻觉现象,在我们提出的FlowExplainBench数据集上,比强基线模型高出10-14%。
English
Flowcharts are a critical tool for visualizing decision-making processes.
However, their non-linear structure and complex visual-textual relationships
make it challenging to interpret them using LLMs, as vision-language models
frequently hallucinate nonexistent connections and decision paths when
analyzing these diagrams. This leads to compromised reliability for automated
flowchart processing in critical domains such as logistics, health, and
engineering. We introduce the task of Fine-grained Flowchart Attribution, which
traces specific components grounding a flowchart referring LLM response.
Flowchart Attribution ensures the verifiability of LLM predictions and improves
explainability by linking generated responses to the flowchart's structure. We
propose FlowPathAgent, a neurosymbolic agent that performs fine-grained post
hoc attribution through graph-based reasoning. It first segments the flowchart,
then converts it into a structured symbolic graph, and then employs an agentic
approach to dynamically interact with the graph, to generate attribution paths.
Additionally, we present FlowExplainBench, a novel benchmark for evaluating
flowchart attributions across diverse styles, domains, and question types.
Experimental results show that FlowPathAgent mitigates visual hallucinations in
LLM answers over flowchart QA, outperforming strong baselines by 10-14% on our
proposed FlowExplainBench dataset.