LLMs在翻译中的失误:M-ALERT揭示了跨语言安全漏洞。
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
December 19, 2024
作者: Felix Friedrich, Simone Tedeschi, Patrick Schramowski, Manuel Brack, Roberto Navigli, Huu Nguyen, Bo Li, Kristian Kersting
cs.AI
摘要
在跨多种语言构建安全的大型语言模型(LLMs)对于确保安全访问和语言多样性至关重要。为此,我们引入了M-ALERT,这是一个多语言基准,评估五种语言(英语、法语、德语、意大利语和西班牙语)中LLMs的安全性。M-ALERT包括每种语言15k个高质量提示,总计75k个,遵循详细的ALERT分类法。我们对10个最先进的LLMs进行了广泛实验,突显了语言特定安全性分析的重要性,揭示了模型在不同语言和类别中往往存在显著的安全性不一致性。例如,Llama3.2在意大利语的crime_tax类别中表现出高度的不安全性,但在其他语言中保持安全。类似的差异可以在所有模型中观察到。相反,某些类别,如substance_cannabis和crime_propaganda,在所有模型和语言中一致地触发不安全的响应。这些发现强调了在LLMs中需要强大的多语言安全实践,以确保在不同用户群体中的安全和负责任的使用。
English
Building safe Large Language Models (LLMs) across multiple languages is
essential in ensuring both safe access and linguistic diversity. To this end,
we introduce M-ALERT, a multilingual benchmark that evaluates the safety of
LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT
includes 15k high-quality prompts per language, totaling 75k, following the
detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs
highlight the importance of language-specific safety analysis, revealing that
models often exhibit significant inconsistencies in safety across languages and
categories. For instance, Llama3.2 shows high unsafety in the category
crime_tax for Italian but remains safe in other languages. Similar differences
can be observed across all models. In contrast, certain categories, such as
substance_cannabis and crime_propaganda, consistently trigger unsafe responses
across models and languages. These findings underscore the need for robust
multilingual safety practices in LLMs to ensure safe and responsible usage
across diverse user communities.