ChatPaper.aiChatPaper

基于视觉语言模型的业务流程图表结构化信息提取

Structured Extraction from Business Process Diagrams Using Vision-Language Models

November 27, 2025
作者: Pritam Deka, Barry Devereux
cs.AI

摘要

业务流程模型与标注(BPMN)是表示复杂业务工作流的广泛采用标准。尽管BPMN图表常以可视化图像形式交换,现有方法主要依赖XML表示进行计算分析。本研究提出一种利用视觉语言模型(VLM)的流程,可直接从图像中提取BPMN图表的结构化JSON表示,无需源模型文件或文本标注。我们结合光学字符识别(OCR)技术实现文本增强,并基于源XML文件生成的基准数据评估所得元素列表。该方法能够在原始源文件不可用的场景下实现稳健的组件提取。通过对多个VLM进行基准测试,我们发现使用OCR进行文本增强时多个模型性能有所提升。此外,我们对基于OCR的增强方法开展了广泛统计分析及提示词消融实验,从而更清晰地揭示了这些方法对模型性能的影响机制。
English
Business Process Model and Notation (BPMN) is a widely adopted standard for representing complex business workflows. While BPMN diagrams are often exchanged as visual images, existing methods primarily rely on XML representations for computational analysis. In this work, we present a pipeline that leverages Vision-Language Models (VLMs) to extract structured JSON representations of BPMN diagrams directly from images, without requiring source model files or textual annotations. We also incorporate optical character recognition (OCR) for textual enrichment and evaluate the generated element lists against ground truth data derived from the source XML files. Our approach enables robust component extraction in scenarios where original source files are unavailable. We benchmark multiple VLMs and observe performance improvements in several models when OCR is used for text enrichment. In addition, we conducted extensive statistical analyses of OCR-based enrichment methods and prompt ablation studies, providing a clearer understanding of their impact on model performance.
PDF01December 3, 2025