循流而動:基於神經符號代理的細粒度流程圖歸因
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
June 2, 2025
作者: Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Vivek Gupta, Dinesh Manocha
cs.AI
摘要
流程圖是視覺化決策過程的關鍵工具。然而,其非線性結構及複雜的視覺-文本關係,使得利用大型語言模型(LLMs)進行解讀時面臨挑戰,因為視覺-語言模型在分析這些圖表時,常會產生不存在的連接與決策路徑的幻覺。這導致在物流、健康及工程等關鍵領域中,自動化流程圖處理的可靠性受到影響。我們引入了細粒度流程圖歸因任務,該任務追蹤特定組件以支撐流程圖所參考的LLM回應。流程圖歸因確保了LLM預測的可驗證性,並通過將生成的回應與流程圖結構相連結,提升了可解釋性。我們提出了FlowPathAgent,這是一種神經符號代理,它通過基於圖的推理執行細粒度的事後歸因。該代理首先對流程圖進行分割,然後將其轉換為結構化的符號圖,接著採用代理方法動態地與圖進行互動,以生成歸因路徑。此外,我們還提出了FlowExplainBench,這是一個新穎的基準,用於評估跨多種風格、領域及問題類型的流程圖歸因。實驗結果顯示,FlowPathAgent在流程圖問答中減少了LLM回答中的視覺幻覺,在我們提出的FlowExplainBench數據集上,相較於強基線模型,其性能提升了10-14%。
English
Flowcharts are a critical tool for visualizing decision-making processes.
However, their non-linear structure and complex visual-textual relationships
make it challenging to interpret them using LLMs, as vision-language models
frequently hallucinate nonexistent connections and decision paths when
analyzing these diagrams. This leads to compromised reliability for automated
flowchart processing in critical domains such as logistics, health, and
engineering. We introduce the task of Fine-grained Flowchart Attribution, which
traces specific components grounding a flowchart referring LLM response.
Flowchart Attribution ensures the verifiability of LLM predictions and improves
explainability by linking generated responses to the flowchart's structure. We
propose FlowPathAgent, a neurosymbolic agent that performs fine-grained post
hoc attribution through graph-based reasoning. It first segments the flowchart,
then converts it into a structured symbolic graph, and then employs an agentic
approach to dynamically interact with the graph, to generate attribution paths.
Additionally, we present FlowExplainBench, a novel benchmark for evaluating
flowchart attributions across diverse styles, domains, and question types.
Experimental results show that FlowPathAgent mitigates visual hallucinations in
LLM answers over flowchart QA, outperforming strong baselines by 10-14% on our
proposed FlowExplainBench dataset.