將適合辦公環境的分類器適應馬來西亞語言文本:增強LLM-Ops框架中的對齊
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework
July 30, 2024
作者: Aisyah Razak, Ariff Nazhan, Kamarul Adha, Wan Adzhar Faiq Adzlan, Mas Aisyah Ahmad, Ammar Azman
cs.AI
摘要
隨著大型語言模型(LLMs)日益融入運營工作流程(LLM-Ops),迫切需要有效的護欄來確保安全和對齊的互動,包括檢測跨語言的潛在不安全或不當內容的能力。然而,現有的適用於工作場所的分類器主要集中在英文文本上。為了填補馬來西亞語言領域的這一空白,我們提出了一種專門針對馬來西亞語言內容的新型適用於工作場所的文本分類器。通過精心策劃和標註一個首創的跨多個內容類別的馬來西亞文本數據集,我們訓練了一個能夠使用最先進的自然語言處理技術識別潛在不安全材料的分類模型。這項工作代表了實現更安全互動和內容篩選、減輕潛在風險並確保LLMs負責部署的重要一步。為了最大程度地提高可訪問性並促進進一步研究以增強LLM-Ops在馬來西亞語境中的對齊性,該模型已公開發布在https://huggingface.co/malaysia-ai/malaysian-sfw-classifier。
English
As large language models (LLMs) become increasingly integrated into
operational workflows (LLM-Ops), there is a pressing need for effective
guardrails to ensure safe and aligned interactions, including the ability to
detect potentially unsafe or inappropriate content across languages. However,
existing safe-for-work classifiers are primarily focused on English text. To
address this gap for the Malaysian language, we present a novel safe-for-work
text classifier tailored specifically for Malaysian language content. By
curating and annotating a first-of-its-kind dataset of Malaysian text spanning
multiple content categories, we trained a classification model capable of
identifying potentially unsafe material using state-of-the-art natural language
processing techniques. This work represents an important step in enabling safer
interactions and content filtering to mitigate potential risks and ensure
responsible deployment of LLMs. To maximize accessibility and promote further
research towards enhancing alignment in LLM-Ops for the Malaysian context, the
model is publicly released at
https://huggingface.co/malaysia-ai/malaysian-sfw-classifier.Summary
AI-Generated Summary