ChatPaper.aiChatPaper

OmniLayout:基於大型語言模型的粗細粒度學習實現通用文檔版面生成

OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

October 30, 2025
作者: Hengrui Kang, Zhuangcheng Gu, Zhiyuan Zhao, Zichen Wen, Bin Wang, Weijia Li, Conghui He
cs.AI

摘要

文档AI技术发展迅猛并日益受到关注。然而当前研究大多聚焦于文档布局分析(DLA),其生成式对应领域——文档布局生成仍处于探索不足的状态。核心瓶颈在于多样化布局数据的稀缺:现有研究主要集中于曼哈顿式结构的学术论文,而报纸杂志等开放领域文档类型则严重缺乏代表性。为弥补这一空白,我们构建了首个百万量级的多样化文档布局数据集OmniLayout-1M,涵盖六种常见文档类型,并通过多源采集收录当代主流布局。针对现有方法在复杂领域表现不佳、难以连贯排列长序列的问题,我们进一步提出OmniLayout-LLM模型(参数量5亿),采用创新的两阶段由粗到精学习范式:1)通过粗粒度类别定义从OmniLayout-1M学习通用布局规则;2)借助细粒度标注将知识迁移至特定领域。大量实验表明,我们的方法在M^{6}Doc数据集的多个领域均取得卓越性能,显著超越现有布局生成专家模型及多个最新通用大语言模型。相关代码、模型及数据集将全面开源。
English
Document AI has advanced rapidly and is attracting increasing attention. Yet, while most efforts have focused on document layout analysis (DLA), its generative counterpart, document layout generation, remains underexplored. A major obstacle lies in the scarcity of diverse layouts: academic papers with Manhattan-style structures dominate existing studies, while open-world genres such as newspapers and magazines remain severely underrepresented. To address this gap, we curate OmniLayout-1M, the first million-scale dataset of diverse document layouts, covering six common document types and comprising contemporary layouts collected from multiple sources. Moreover, since existing methods struggle in complex domains and often fail to arrange long sequences coherently, we introduce OmniLayout-LLM, a 0.5B model with designed two-stage Coarse-to-Fine learning paradigm: 1) learning universal layout principles from OmniLayout-1M with coarse category definitions, and 2) transferring the knowledge to a specific domain with fine-grained annotations. Extensive experiments demonstrate that our approach achieves strong performance on multiple domains in M^{6}Doc dataset, substantially surpassing both existing layout generation experts and several latest general-purpose LLMs. Our code, models, and dataset will be publicly released.
PDF91December 2, 2025