ChatPaper.aiChatPaper

基于视觉语言模型的业务流程图结构化信息提取

Structured Extraction from Business Process Diagrams Using Vision-Language Models

November 27, 2025
作者: Pritam Deka, Barry Devereux
cs.AI

摘要

业务流程模型与标注(BPMN)作为表示复杂业务工作流的广泛采用标准,其图示通常以视觉图像形式进行交换,而现有分析方法主要依赖XML表示形式进行计算。本研究提出一种基于视觉语言模型(VLM)的流程,可直接从图像中提取BPMN图的结构化JSON表示,无需源模型文件或文本标注辅助。我们结合光学字符识别(OCR)技术实现文本增强,并通过源自XML文件的基准数据评估生成元素列表的准确性。该方法能够在原始源文件缺失的场景下实现稳健的组件提取。通过对多种VLM模型的基准测试,我们发现采用OCR文本增强后多个模型的性能得到提升。此外,我们针对基于OCR的增强方法开展了系统的统计分析及提示词消融实验,从而更清晰地揭示这些因素对模型性能的影响机制。
English
Business Process Model and Notation (BPMN) is a widely adopted standard for representing complex business workflows. While BPMN diagrams are often exchanged as visual images, existing methods primarily rely on XML representations for computational analysis. In this work, we present a pipeline that leverages Vision-Language Models (VLMs) to extract structured JSON representations of BPMN diagrams directly from images, without requiring source model files or textual annotations. We also incorporate optical character recognition (OCR) for textual enrichment and evaluate the generated element lists against ground truth data derived from the source XML files. Our approach enables robust component extraction in scenarios where original source files are unavailable. We benchmark multiple VLMs and observe performance improvements in several models when OCR is used for text enrichment. In addition, we conducted extensive statistical analyses of OCR-based enrichment methods and prompt ablation studies, providing a clearer understanding of their impact on model performance.
PDF01December 3, 2025