ForCenNet:面向文档图像校正的前景中心网络
ForCenNet: Foreground-Centric Network for Document Image Rectification
July 26, 2025
作者: Peng Cai, Qiang Li, Kaicheng Yang, Dong Guo, Jia Li, Nan Zhou, Xiang An, Ninghua Yang, Jiankang Deng
cs.AI
摘要
文件影像校正旨在消除拍攝文件中的幾何變形,以便於文字識別。然而,現有方法往往忽視了前景元素的重要性,這些元素為文件影像校正提供了關鍵的幾何參考和版面信息。本文中,我們引入了前景中心網絡(ForCenNet)來消除文件影像中的幾何失真。具體而言,我們首先提出了一種前景中心的標籤生成方法,該方法從未失真影像中提取詳細的前景元素。接著,我們引入了一種前景中心遮罩機制,以增強可讀區域與背景區域之間的區分。此外,我們設計了一種曲率一致性損失,利用詳細的前景標籤來幫助模型理解失真的幾何分佈。大量實驗表明,ForCenNet在四個真實世界基準測試(如DocUNet、DIR300、WarpDoc和DocReal)上達到了新的最先進水平。定量分析顯示,所提出的方法有效地校正了版面元素,如文本行和表格邊框。進一步比較的資源已提供於https://github.com/caipeng328/ForCenNet。
English
Document image rectification aims to eliminate geometric deformation in
photographed documents to facilitate text recognition. However, existing
methods often neglect the significance of foreground elements, which provide
essential geometric references and layout information for document image
correction. In this paper, we introduce Foreground-Centric Network (ForCenNet)
to eliminate geometric distortions in document images. Specifically, we
initially propose a foreground-centric label generation method, which extracts
detailed foreground elements from an undistorted image. Then we introduce a
foreground-centric mask mechanism to enhance the distinction between readable
and background regions. Furthermore, we design a curvature consistency loss to
leverage the detailed foreground labels to help the model understand the
distorted geometric distribution. Extensive experiments demonstrate that
ForCenNet achieves new state-of-the-art on four real-world benchmarks, such as
DocUNet, DIR300, WarpDoc, and DocReal. Quantitative analysis shows that the
proposed method effectively undistorts layout elements, such as text lines and
table borders. The resources for further comparison are provided at
https://github.com/caipeng328/ForCenNet.