ForCenNet:面向文档图像校正的前景中心网络
ForCenNet: Foreground-Centric Network for Document Image Rectification
July 26, 2025
作者: Peng Cai, Qiang Li, Kaicheng Yang, Dong Guo, Jia Li, Nan Zhou, Xiang An, Ninghua Yang, Jiankang Deng
cs.AI
摘要
文档图像校正旨在消除拍摄文档中的几何变形,以便于文本识别。然而,现有方法往往忽视了前景元素的重要性,这些元素为文档图像校正提供了关键的几何参考和布局信息。本文中,我们引入了前景中心网络(ForCenNet)来消除文档图像中的几何失真。具体而言,我们首先提出了一种前景中心标签生成方法,该方法从未失真的图像中提取详细的前景元素。随后,我们引入了一种前景中心掩码机制,以增强可读区域与背景区域之间的区分度。此外,我们设计了一种曲率一致性损失,利用详细的前景标签帮助模型理解失真的几何分布。大量实验表明,ForCenNet在DocUNet、DIR300、WarpDoc和DocReal四个真实世界基准测试中达到了新的最先进水平。定量分析显示,所提方法有效地校正了文本行和表格边框等布局元素。进一步的比较资源已发布于https://github.com/caipeng328/ForCenNet。
English
Document image rectification aims to eliminate geometric deformation in
photographed documents to facilitate text recognition. However, existing
methods often neglect the significance of foreground elements, which provide
essential geometric references and layout information for document image
correction. In this paper, we introduce Foreground-Centric Network (ForCenNet)
to eliminate geometric distortions in document images. Specifically, we
initially propose a foreground-centric label generation method, which extracts
detailed foreground elements from an undistorted image. Then we introduce a
foreground-centric mask mechanism to enhance the distinction between readable
and background regions. Furthermore, we design a curvature consistency loss to
leverage the detailed foreground labels to help the model understand the
distorted geometric distribution. Extensive experiments demonstrate that
ForCenNet achieves new state-of-the-art on four real-world benchmarks, such as
DocUNet, DIR300, WarpDoc, and DocReal. Quantitative analysis shows that the
proposed method effectively undistorts layout elements, such as text lines and
table borders. The resources for further comparison are provided at
https://github.com/caipeng328/ForCenNet.