ChatPaper.aiChatPaper

Platypus:一個通用的專家模型,用於閱讀各種形式的文本

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

August 27, 2024
作者: Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao
cs.AI

摘要

數十年來,從圖像(包括自然場景或文件)中讀取文字一直是一個歷史悠久的研究課題,這是因為其高度的技術挑戰和廣泛的應用範圍。過去,為了應對文字閱讀的子任務(例如場景文字識別、手寫文字識別和數學表達式識別),通常會開發單獨的專家模型。然而,這些專家模型通常無法有效地應用於不同的子任務。最近,像是GPT-4V這樣的通用模型,在統一方式下訓練了大量數據,展現了在各種情境中閱讀文字的巨大潛力,但卻存在準確性有限和效率低的缺點。在這項工作中,我們提出了Platypus,一個針對文字閱讀的通用專家模型。具體來說,Platypus結合了兩者的優勢:能夠使用單一統一的架構識別各種形式的文字,同時實現卓越的準確性和高效率。為了更好地利用Platypus的優勢,我們還構建了一個文字閱讀數據集(名為Worms),其中的圖像是從先前的數據集中精心挑選並進行部分重新標記。在標準基準測試上的實驗證明了所提出的Platypus模型的有效性和優越性。模型和數據將在以下網址公開提供:https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/Platypus。
English
Reading text from images (either natural scenes or documents) has been a long-standing research topic for decades, due to the high technical challenge and wide application range. Previously, individual specialist models are developed to tackle the sub-tasks of text reading (e.g., scene text recognition, handwritten text recognition and mathematical expression recognition). However, such specialist models usually cannot effectively generalize across different sub-tasks. Recently, generalist models (such as GPT-4V), trained on tremendous data in a unified way, have shown enormous potential in reading text in various scenarios, but with the drawbacks of limited accuracy and low efficiency. In this work, we propose Platypus, a generalized specialist model for text reading. Specifically, Platypus combines the best of both worlds: being able to recognize text of various forms with a single unified architecture, while achieving excellent accuracy and high efficiency. To better exploit the advantage of Platypus, we also construct a text reading dataset (called Worms), the images of which are curated from previous datasets and partially re-labeled. Experiments on standard benchmarks demonstrate the effectiveness and superiority of the proposed Platypus model. Model and data will be made publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/Platypus.

Summary

AI-Generated Summary

PDF152November 16, 2024