ChatPaper.aiChatPaper

Platypus:一种用于阅读各种形式文本的广义专家模型

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

August 27, 2024
作者: Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao
cs.AI

摘要

读取图像中的文本(无论是自然场景还是文档)是一个长期存在的研究课题,由于技术挑战高且应用范围广泛。过去,为了解决文本阅读的子任务(例如场景文本识别、手写文本识别和数学表达式识别),通常会开发单独的专家模型。然而,这种专家模型通常无法有效地泛化到不同的子任务上。最近,通用模型(如GPT-4V),在统一方式下训练了大量数据,展现了在各种场景中读取文本的巨大潜力,但存在精度有限和效率低的缺点。在这项工作中,我们提出了Platypus,一个用于文本阅读的通用专家模型。具体而言,Platypus结合了两者的优点:能够用单一统一的架构识别各种形式的文本,同时实现出色的准确性和高效率。为了更好地利用Platypus的优势,我们还构建了一个文本阅读数据集(称为Worms),其中的图像是从先前的数据集中筛选出来并进行了部分重新标记。对标准基准测试的实验表明了所提出的Platypus模型的有效性和优越性。模型和数据将在以下网址公开提供:https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/Platypus。
English
Reading text from images (either natural scenes or documents) has been a long-standing research topic for decades, due to the high technical challenge and wide application range. Previously, individual specialist models are developed to tackle the sub-tasks of text reading (e.g., scene text recognition, handwritten text recognition and mathematical expression recognition). However, such specialist models usually cannot effectively generalize across different sub-tasks. Recently, generalist models (such as GPT-4V), trained on tremendous data in a unified way, have shown enormous potential in reading text in various scenarios, but with the drawbacks of limited accuracy and low efficiency. In this work, we propose Platypus, a generalized specialist model for text reading. Specifically, Platypus combines the best of both worlds: being able to recognize text of various forms with a single unified architecture, while achieving excellent accuracy and high efficiency. To better exploit the advantage of Platypus, we also construct a text reading dataset (called Worms), the images of which are curated from previous datasets and partially re-labeled. Experiments on standard benchmarks demonstrate the effectiveness and superiority of the proposed Platypus model. Model and data will be made publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/Platypus.

Summary

AI-Generated Summary

PDF152November 16, 2024