牛轧糖：学术文件的神经光学理解

摘要

科学知识主要存储在书籍和科学期刊中，通常以PDF形式存在。然而，PDF格式会导致语义信息的丢失，特别是对数学表达式而言。我们提出了Nougat（学术文档的神经光学理解），这是一个视觉Transformer模型，用于执行光学字符识别（OCR）任务，将科学文档处理成标记语言，并展示了我们模型在一组新的科学文档数据集上的有效性。所提出的方法为增强数字时代科学知识的可访问性提供了一个有前途的解决方案，通过弥合人类可读文档和机器可读文本之间的差距。我们发布了模型和代码，以加速未来科学文本识别工作的进展。

English

Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition.

牛轧糖：学术文件的神经光学理解

Nougat: Neural Optical Understanding for Academic Documents

摘要

Support