ChatPaper.aiChatPaper

XTREME-UP:面向用户的稀缺数据基准,用于代表性不足的语言。

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

May 19, 2023
作者: Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson, Dmitry Panteleev, Partha Talukdar
cs.AI

摘要

数据稀缺是高度多语言自然语言处理系统发展中的一个关键问题。然而,对于许多代表性不足的语言(ULs)——即自然语言处理研究在满足用户需求方面特别落后的语言,注释少量数据是可行的。受此启发,我们提出了XTREME-UP,一个基准测试,其特点是:专注于稀缺数据情景而非零-shot;专注于用户中心任务——这些任务被高资源语言使用者广泛采用;以及专注于代表性不足语言,在这些语言中,稀缺数据情景往往最为现实。XTREME-UP评估语言模型在88种代表性不足语言上的能力,涵盖9个关键的用户中心技术,包括ASR、OCR、MT和信息访问任务,这些任务具有普遍实用性。我们为OCR、自动完成、语义解析和音译创建了新数据集,并在其他任务上构建和完善现有数据集。XTREME-UP提供了评估多种建模情景的方法,包括仅文本、多模态(视觉、音频和文本)、监督参数调整和上下文学习。我们在基准测试上评估了常用模型。我们公开所有用于训练和评估模型的代码和脚本。
English
Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models
PDF10December 15, 2024