ChatPaper.aiChatPaper

RabakBench:扩展人工标注以构建面向低资源语言的本地化多语言安全基准

RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages

July 8, 2025
作者: Gabriel Chua, Leanne Tan, Ziyu Ge, Roy Ka-Wei Lee
cs.AI

摘要

大型语言模型(LLMs)及其安全分类器在低资源语言上往往表现欠佳,这主要归因于有限的训练数据和评估基准。本文介绍了RabakBench,一个针对新加坡独特语言环境本地化的新型多语言安全基准,涵盖新加坡英语、中文、马来语和泰米尔语。RabakBench通过一个可扩展的三阶段流程构建:(i) 生成——利用LLM驱动的红队策略增强真实新加坡英语网络内容,生成对抗性示例;(ii) 标注——采用多数投票的LLM标注器进行半自动化多标签安全标注,确保与人类判断一致;(iii) 翻译——进行高保真翻译,保留跨语言的细微差别和毒性。最终数据集包含超过5,000个安全标注示例,覆盖四种语言和六个细粒度安全类别,并附带严重程度等级。对11个流行的开源和闭源防护分类器的评估显示,其性能显著下降。RabakBench不仅支持在东南亚多语言环境中进行稳健的安全评估,还提供了一个可复制的框架,用于在低资源环境下构建本地化安全数据集。该基准数据集,包括经过人工验证的翻译和评估代码,均已公开可用。
English
Large language models (LLMs) and their safety classifiers often perform poorly on low-resource languages due to limited training data and evaluation benchmarks. This paper introduces RabakBench, a new multilingual safety benchmark localized to Singapore's unique linguistic context, covering Singlish, Chinese, Malay, and Tamil. RabakBench is constructed through a scalable three-stage pipeline: (i) Generate - adversarial example generation by augmenting real Singlish web content with LLM-driven red teaming; (ii) Label - semi-automated multi-label safety annotation using majority-voted LLM labelers aligned with human judgments; and (iii) Translate - high-fidelity translation preserving linguistic nuance and toxicity across languages. The final dataset comprises over 5,000 safety-labeled examples across four languages and six fine-grained safety categories with severity levels. Evaluations of 11 popular open-source and closed-source guardrail classifiers reveal significant performance degradation. RabakBench not only enables robust safety evaluation in Southeast Asian multilingual settings but also offers a reproducible framework for building localized safety datasets in low-resource environments. The benchmark dataset, including the human-verified translations, and evaluation code are publicly available.
PDF11July 11, 2025