ChatPaper.aiChatPaper

XTREME-UP:針對少數代表性語言的用戶中心稀缺數據基準。

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

May 19, 2023
作者: Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson, Dmitry Panteleev, Partha Talukdar
cs.AI

摘要

資料稀缺對高度多語言自然語言處理系統的發展至關重要。然而,對於許多代表性不足的語言(ULs)——即自然語言處理研究在滿足用戶需求方面特別落後的語言,標註少量資料是可行的。受此啟發,我們提出了XTREME-UP,這是一個基準測試,其特點在於專注於稀缺資料情境而非零-shot;專注於用戶中心任務——這些任務被高資源語言的使用者廣泛採用;以及專注於代表性不足語言,其中稀缺資料情境往往最為現實。XTREME-UP評估了語言模型在88種代表性不足語言上的能力,涵蓋9項關鍵的用戶中心技術,包括語音識別(ASR)、光學字符識別(OCR)、機器翻譯(MT)和信息訪問任務,這些任務具有廣泛的實用性。我們為OCR、自動完成、語義解析和音譯創建了新的數據集,並在其他任務上建立和完善現有數據集。XTREME-UP提供了評估多種建模情境的方法,包括僅文本、多模式(視覺、音訊和文本)、監督參數調整和上下文學習。我們在該基準測試上評估了常用模型。我們釋出了所有代碼和腳本,以訓練和評估模型。
English
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models
PDF10December 15, 2024