ChatPaper.aiChatPaper

NVIDIA Nemotron Parse 1.1

NVIDIA Nemotron Parse 1.1

November 25, 2025
作者: Kateryna Chumachenko, Amala Sanjay Deshmukh, Jarno Seppanen, Ilia Karmanov, Chia-Chih Chen, Lukas Voegtle, Philipp Fischer, Marek Wawrzos, Saeid Motiian, Roman Ageev, Kedi Wu, Alexandre Milesi, Maryam Moosaei, Krzysztof Pawelec, Padmavathy Subramanian, Mehrzad Samadi, Xin Yu, Celina Dear, Sarah Stoddard, Jenna Diamond, Jesse Oliver, Leanna Chraghchian, Patrick Skelly, Tom Balough, Yao Xu, Jane Polak Scowcroft, Daniel Korzekwa, Darragh Hanley, Sandip Bhaskar, Timo Roman, Karan Sapra, Andrew Tao, Bryan Catanzaro
cs.AI

摘要

我們推出 Nemotron-Parse-1.1,這款輕量級文件解析與OCR模型在其前身 Nemoretriever-Parse-1.0 的基礎上實現了能力升級。該模型在通用OCR、Markdown格式解析、結構化表格解析,以及從圖片、圖表與圖示中提取文字等方面均有所提升,並支援更長的輸出序列長度以處理視覺密集型文件。承襲前代特性,它能提取文字區塊的邊界框及對應語義類別。Nemotron-Parse-1.1採用編碼器-解碼器架構,參數量達8.85億,其中包含精簡的2.56億參數語言解碼器。在公開基準測試中展現極具競爭力的準確度,成為強大的輕量級OCR解決方案。我們已於Huggingface平台公開發布模型權重、優化的NIM容器,並隨Nemotron-VLM-v2數據集釋出部分訓練資料。同時推出視覺標記長度縮減的Nemotron-Parse-1.1-TC版本,在幾乎不影響品質的前提下實現20%的速率提升。
English
We introduce Nemotron-Parse-1.1, a lightweight document parsing and OCR model that advances the capabilities of its predecessor, Nemoretriever-Parse-1.0. Nemotron-Parse-1.1 delivers improved capabilities across general OCR, markdown formatting, structured table parsing, and text extraction from pictures, charts, and diagrams. It also supports a longer output sequence length for visually dense documents. As with its predecessor, it extracts bounding boxes of text segments, as well as corresponding semantic classes. Nemotron-Parse-1.1 follows an encoder-decoder architecture with 885M parameters, including a compact 256M-parameter language decoder. It achieves competitive accuracy on public benchmarks making it a strong lightweight OCR solution. We release the model weights publicly on Huggingface, as well as an optimized NIM container, along with a subset of the training data as part of the broader Nemotron-VLM-v2 dataset. Additionally, we release Nemotron-Parse-1.1-TC which operates on a reduced vision token length, offering a 20% speed improvement with minimal quality degradation.
PDF192December 1, 2025