LLMs在翻译中的失误：M-ALERT揭示了跨语言安全漏洞。

摘要

在跨多种语言构建安全的大型语言模型（LLMs）对于确保安全访问和语言多样性至关重要。为此，我们引入了M-ALERT，这是一个多语言基准，评估五种语言（英语、法语、德语、意大利语和西班牙语）中LLMs的安全性。M-ALERT包括每种语言15k个高质量提示，总计75k个，遵循详细的ALERT分类法。我们对10个最先进的LLMs进行了广泛实验，突显了语言特定安全性分析的重要性，揭示了模型在不同语言和类别中往往存在显著的安全性不一致性。例如，Llama3.2在意大利语的crime_tax类别中表现出高度的不安全性，但在其他语言中保持安全。类似的差异可以在所有模型中观察到。相反，某些类别，如substance_cannabis和crime_propaganda，在所有模型和语言中一致地触发不安全的响应。这些发现强调了在LLMs中需要强大的多语言安全实践，以确保在不同用户群体中的安全和负责任的使用。

English

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

LLMs在翻译中的失误：M-ALERT揭示了跨语言安全漏洞。

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

摘要

Support