为马来西亚语文本调整适用于安全工作的分类器:增强LLM-Ops框架中的对齐
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework
July 30, 2024
作者: Aisyah Razak, Ariff Nazhan, Kamarul Adha, Wan Adzhar Faiq Adzlan, Mas Aisyah Ahmad, Ammar Azman
cs.AI
摘要
随着大型语言模型(LLMs)越来越多地整合到运营工作流程中(LLM-Ops),迫切需要有效的防护措施,以确保安全和对齐的交互,包括检测跨语言的潜在不安全或不恰当内容的能力。然而,现有的适用于工作场所的分类器主要集中在英文文本上。为了填补马来西亚语言领域的这一空白,我们提出了一种专门针对马来西亚语言内容的新型适用于工作场所的文本分类器。通过精心策划和注释一种独一无二的跨多个内容类别的马来西亚文本数据集,我们训练了一个能够使用最先进的自然语言处理技术识别潜在不安全材料的分类模型。这项工作代表了在实现更安全的互动和内容过滤以减轻潜在风险并确保负责任部署LLMs方面的重要一步。为了最大限度地提高可访问性并促进进一步研究以增强LLM-Ops在马来西亚环境中的对齐性,该模型已在以下网址公开发布:https://huggingface.co/malaysia-ai/malaysian-sfw-classifier。
English
As large language models (LLMs) become increasingly integrated into
operational workflows (LLM-Ops), there is a pressing need for effective
guardrails to ensure safe and aligned interactions, including the ability to
detect potentially unsafe or inappropriate content across languages. However,
existing safe-for-work classifiers are primarily focused on English text. To
address this gap for the Malaysian language, we present a novel safe-for-work
text classifier tailored specifically for Malaysian language content. By
curating and annotating a first-of-its-kind dataset of Malaysian text spanning
multiple content categories, we trained a classification model capable of
identifying potentially unsafe material using state-of-the-art natural language
processing techniques. This work represents an important step in enabling safer
interactions and content filtering to mitigate potential risks and ensure
responsible deployment of LLMs. To maximize accessibility and promote further
research towards enhancing alignment in LLM-Ops for the Malaysian context, the
model is publicly released at
https://huggingface.co/malaysia-ai/malaysian-sfw-classifier.Summary
AI-Generated Summary