RabakBench: ローカライズされた多言語安全性ベンチマークの構築に向けた低リソース言語における人間のアノテーションのスケーリング

要旨

大規模言語モデル（LLMs）とその安全性分類器は、低リソース言語において、限られた訓練データと評価ベンチマークのため、しばしば性能が低い。本論文では、シンガポールの独特な言語的文脈にローカライズされた新しい多言語安全性ベンチマークであるRabakBenchを紹介する。これはシングリッシュ、中国語、マレー語、タミル語をカバーしている。RabakBenchは、スケーラブルな3段階のパイプラインを通じて構築される：（i）生成 - 実際のシングリッシュウェブコンテンツをLLM駆動のレッドチーミングで拡張し、敵対的例を生成する；（ii）ラベル - 人間の判断と整合した多数決によるLLMラベラーを用いた半自動化された多ラベル安全性注釈；（iii）翻訳 - 言語間のニュアンスと毒性を保持した高忠実度翻訳。最終的なデータセットは、4言語と6つの細分化された安全性カテゴリにわたる5,000以上の安全性ラベル付き例を含む。11の一般的なオープンソースおよびクローズドソースのガードレール分類器の評価により、性能の大幅な低下が明らかになった。RabakBenchは、東南アジアの多言語環境における堅牢な安全性評価を可能にするだけでなく、低リソース環境でのローカライズされた安全性データセットを構築するための再現可能なフレームワークを提供する。ベンチマークデータセット、人間による検証済み翻訳、および評価コードは公開されている。

English

Large language models (LLMs) and their safety classifiers often perform poorly on low-resource languages due to limited training data and evaluation benchmarks. This paper introduces RabakBench, a new multilingual safety benchmark localized to Singapore's unique linguistic context, covering Singlish, Chinese, Malay, and Tamil. RabakBench is constructed through a scalable three-stage pipeline: (i) Generate - adversarial example generation by augmenting real Singlish web content with LLM-driven red teaming; (ii) Label - semi-automated multi-label safety annotation using majority-voted LLM labelers aligned with human judgments; and (iii) Translate - high-fidelity translation preserving linguistic nuance and toxicity across languages. The final dataset comprises over 5,000 safety-labeled examples across four languages and six fine-grained safety categories with severity levels. Evaluations of 11 popular open-source and closed-source guardrail classifiers reveal significant performance degradation. RabakBench not only enables robust safety evaluation in Southeast Asian multilingual settings but also offers a reproducible framework for building localized safety datasets in low-resource environments. The benchmark dataset, including the human-verified translations, and evaluation code are publicly available.

RabakBench: ローカライズされた多言語安全性ベンチマークの構築に向けた低リソース言語における人間のアノテーションのスケーリング

RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages

要旨

Support