將適合辦公環境的分類器適應馬來西亞語言文本：增強LLM-Ops框架中的對齊

摘要

隨著大型語言模型（LLMs）日益融入運營工作流程（LLM-Ops），迫切需要有效的護欄來確保安全和對齊的互動，包括檢測跨語言的潛在不安全或不當內容的能力。然而，現有的適用於工作場所的分類器主要集中在英文文本上。為了填補馬來西亞語言領域的這一空白，我們提出了一種專門針對馬來西亞語言內容的新型適用於工作場所的文本分類器。通過精心策劃和標註一個首創的跨多個內容類別的馬來西亞文本數據集，我們訓練了一個能夠使用最先進的自然語言處理技術識別潛在不安全材料的分類模型。這項工作代表了實現更安全互動和內容篩選、減輕潛在風險並確保LLMs負責部署的重要一步。為了最大程度地提高可訪問性並促進進一步研究以增強LLM-Ops在馬來西亞語境中的對齊性，該模型已公開發布在https://huggingface.co/malaysia-ai/malaysian-sfw-classifier。

English

As large language models (LLMs) become increasingly integrated into operational workflows (LLM-Ops), there is a pressing need for effective guardrails to ensure safe and aligned interactions, including the ability to detect potentially unsafe or inappropriate content across languages. However, existing safe-for-work classifiers are primarily focused on English text. To address this gap for the Malaysian language, we present a novel safe-for-work text classifier tailored specifically for Malaysian language content. By curating and annotating a first-of-its-kind dataset of Malaysian text spanning multiple content categories, we trained a classification model capable of identifying potentially unsafe material using state-of-the-art natural language processing techniques. This work represents an important step in enabling safer interactions and content filtering to mitigate potential risks and ensure responsible deployment of LLMs. To maximize accessibility and promote further research towards enhancing alignment in LLM-Ops for the Malaysian context, the model is publicly released at https://huggingface.co/malaysia-ai/malaysian-sfw-classifier.

將適合辦公環境的分類器適應馬來西亞語言文本：增強LLM-Ops框架中的對齊

Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework

摘要

Support