罗马尼亚数学推理基准测试RoMath

摘要

数学长期以来主要通过自然语言传达，以便人类理解。随着机械化数学和证明助手的兴起，人们越来越需要理解非正式的数学文本，然而大多数现有基准测试仅关注英语，忽视了其他语言。本文介绍了RoMath，一个罗马尼亚数学推理基准套件，包括三个数据集：RoMath-文凭、RoMath-竞赛和RoMath-合成，涵盖了各种数学领域和难度级别，旨在改进非英语语言模型并促进多语言人工智能发展。通过专注于罗马尼亚语，一种资源稀缺且具有独特语言特征的语言，RoMath解决了以英语为中心的模型的局限性，并强调了超越简单自动翻译的需求。我们对几个开放权重语言模型进行基准测试，突出了为代表性不足的语言创建资源的重要性。我们提供代码和数据集。

English

Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there is a growing need to understand informal mathematical text, yet most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three datasets: RoMath-Baccalaureate, RoMath-Competitions and RoMath-Synthetic, which cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. By focusing on Romanian, a low-resource language with unique linguistic features, RoMath addresses the limitations of Anglo-centric models and emphasizes the need for dedicated resources beyond simple automatic translation. We benchmark several open-weight language models, highlighting the importance of creating resources for underrepresented languages. We make the code and dataset available.