ChatPaper.aiChatPaper

OmniLayout:基于大语言模型实现通用文档布局生成的从粗到细学习

OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

October 30, 2025
作者: Hengrui Kang, Zhuangcheng Gu, Zhiyuan Zhao, Zichen Wen, Bin Wang, Weijia Li, Conghui He
cs.AI

摘要

文档智能技术发展迅猛且日益受到关注。然而当前研究大多聚焦于文档布局分析(DLA),其生成式对应领域——文档布局生成仍处于探索不足的状态。主要障碍在于多样化布局数据的稀缺:现有研究多集中于曼哈顿式结构的学术论文,而报纸杂志等开放场景的文档类型严重缺乏代表性数据。为弥补这一空白,我们构建了首个百万量级多样化文档布局数据集OmniLayout-1M,涵盖六种常见文档类型,收录了来自多源的真实当代布局。针对现有方法在复杂领域表现不佳、难以连贯编排长序列的问题,我们进一步提出OmniLayout-LLM模型(参数量0.5B),创新性地采用两阶段由粗到精的学习范式:首先通过粗粒度类别定义从OmniLayout-1M学习通用布局规则,继而通过细粒度标注将知识迁移至特定领域。大量实验表明,我们的方法在M^{6}Doc数据集的多个领域均取得卓越性能,显著超越现有布局生成专家模型及多个最新通用大语言模型。相关代码、模型及数据集将全面开源。
English
Document AI has advanced rapidly and is attracting increasing attention. Yet, while most efforts have focused on document layout analysis (DLA), its generative counterpart, document layout generation, remains underexplored. A major obstacle lies in the scarcity of diverse layouts: academic papers with Manhattan-style structures dominate existing studies, while open-world genres such as newspapers and magazines remain severely underrepresented. To address this gap, we curate OmniLayout-1M, the first million-scale dataset of diverse document layouts, covering six common document types and comprising contemporary layouts collected from multiple sources. Moreover, since existing methods struggle in complex domains and often fail to arrange long sequences coherently, we introduce OmniLayout-LLM, a 0.5B model with designed two-stage Coarse-to-Fine learning paradigm: 1) learning universal layout principles from OmniLayout-1M with coarse category definitions, and 2) transferring the knowledge to a specific domain with fine-grained annotations. Extensive experiments demonstrate that our approach achieves strong performance on multiple domains in M^{6}Doc dataset, substantially surpassing both existing layout generation experts and several latest general-purpose LLMs. Our code, models, and dataset will be publicly released.
PDF91December 2, 2025