NVIDIA Nemotron Parse 1.1
NVIDIA Nemotron Parse 1.1
November 25, 2025
作者: Kateryna Chumachenko, Amala Sanjay Deshmukh, Jarno Seppanen, Ilia Karmanov, Chia-Chih Chen, Lukas Voegtle, Philipp Fischer, Marek Wawrzos, Saeid Motiian, Roman Ageev, Kedi Wu, Alexandre Milesi, Maryam Moosaei, Krzysztof Pawelec, Padmavathy Subramanian, Mehrzad Samadi, Xin Yu, Celina Dear, Sarah Stoddard, Jenna Diamond, Jesse Oliver, Leanna Chraghchian, Patrick Skelly, Tom Balough, Yao Xu, Jane Polak Scowcroft, Daniel Korzekwa, Darragh Hanley, Sandip Bhaskar, Timo Roman, Karan Sapra, Andrew Tao, Bryan Catanzaro
cs.AI
摘要
我们推出Nemotron-Parse-1.1,这是一款轻量级文档解析与OCR模型,其在前代Nemoretriever-Parse-1.0的基础上实现了能力升级。该模型在通用OCR、Markdown格式解析、结构化表格解析以及图片/图表/示意图的文本提取方面均表现出增强性能,同时支持对视觉密集文档进行更长输出序列的处理。与前代模型一致,它能够提取文本段的边界框及对应语义类别。Nemotron-Parse-1.1采用编码器-解码器架构,参数量达8.85亿(其中语言解码器为紧凑型2.56亿参数),在公开基准测试中达到业界领先的准确率,成为强有力的轻量级OCR解决方案。我们已在Huggingface平台公开发布模型权重、优化的NIM容器以及作为Nemotron-VLM-v2数据集组成部分的部分训练数据。此外,我们还发布了Nemotron-Parse-1.1-TC版本,该版本通过缩减视觉标记长度实现20%的速度提升,且质量损失微乎其微。
English
We introduce Nemotron-Parse-1.1, a lightweight document parsing and OCR model that advances the capabilities of its predecessor, Nemoretriever-Parse-1.0. Nemotron-Parse-1.1 delivers improved capabilities across general OCR, markdown formatting, structured table parsing, and text extraction from pictures, charts, and diagrams. It also supports a longer output sequence length for visually dense documents. As with its predecessor, it extracts bounding boxes of text segments, as well as corresponding semantic classes. Nemotron-Parse-1.1 follows an encoder-decoder architecture with 885M parameters, including a compact 256M-parameter language decoder. It achieves competitive accuracy on public benchmarks making it a strong lightweight OCR solution. We release the model weights publicly on Huggingface, as well as an optimized NIM container, along with a subset of the training data as part of the broader Nemotron-VLM-v2 dataset. Additionally, we release Nemotron-Parse-1.1-TC which operates on a reduced vision token length, offering a 20% speed improvement with minimal quality degradation.