ChatPaper.aiChatPaper

DeepSeek-OCR:上下文光学压缩

DeepSeek-OCR: Contexts Optical Compression

October 21, 2025
作者: Haoran Wei, Yaofeng Sun, Yukun Li
cs.AI

摘要

我们提出DeepSeek-OCR作为通过光学二维映射压缩长上下文可行性的初步探索。DeepSeek-OCR由两个组件构成:DeepEncoder作为编码器,DeepSeek3B-MoE-A570M作为解码器。具体而言,DeepEncoder作为核心引擎,旨在高分辨率输入下保持低激活度,同时实现高压缩比,以确保视觉令牌数量最优且易于管理。实验表明,当文本令牌数量在视觉令牌数量的10倍以内(即压缩比<10倍)时,模型能够实现97%的解码(OCR)精度。即使在20倍的压缩比下,OCR准确率仍保持在约60%。这为历史长上下文压缩及大语言模型(LLMs)中的记忆遗忘机制等研究领域展现了显著潜力。此外,DeepSeek-OCR还展示了较高的实用价值。在OmniDocBench上,它仅使用100个视觉令牌便超越了GOT-OCR2.0(每页256令牌),并在使用少于800个视觉令牌的情况下优于MinerU2.0(平均每页6000+令牌)。在实际生产中,DeepSeek-OCR能够以单张A100-40G显卡每日生成超过20万页的训练数据,服务于大语言模型/视觉语言模型(LLMs/VLMs)。代码及模型权重已公开于http://github.com/deepseek-ai/DeepSeek-OCR。
English
We present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically, DeepEncoder serves as the core engine, designed to maintain low activations under high-resolution input while achieving high compression ratios to ensure an optimal and manageable number of vision tokens. Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10x), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20x, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs. Beyond this, DeepSeek-OCR also demonstrates high practical value. On OmniDocBench, it surpasses GOT-OCR2.0 (256 tokens/page) using only 100 vision tokens, and outperforms MinerU2.0 (6000+ tokens per page on average) while utilizing fewer than 800 vision tokens. In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G). Codes and model weights are publicly accessible at http://github.com/deepseek-ai/DeepSeek-OCR.
PDF252October 22, 2025