マレーシア語テキスト向けSafe-for-Work分類器の適応：LLM-Opsフレームワークにおける整合性の向上

要旨

大規模言語モデル（LLMs）が運用ワークフロー（LLM-Ops）にますます統合されるにつれ、安全で整合性のあるインタラクションを確保するための効果的なガードレールの必要性が高まっています。これには、複数言語にわたる潜在的に安全でない、または不適切なコンテンツを検出する能力も含まれます。しかし、既存の安全なコンテンツ分類器は主に英語テキストに焦点を当てています。このギャップを埋めるため、マレーシア語コンテンツに特化した新しい安全なテキスト分類器を提案します。複数のコンテンツカテゴリにわたるマレーシア語テキストの初めてのデータセットをキュレーションし、注釈を付けることで、最先端の自然言語処理技術を用いて潜在的に安全でない素材を識別できる分類モデルを訓練しました。この研究は、LLMsの責任ある展開を確保し、潜在的なリスクを軽減するためのより安全なインタラクションとコンテンツフィルタリングを可能にする重要な一歩です。マレーシアの文脈におけるLLM-Opsの整合性を強化するためのさらなる研究を促進し、アクセシビリティを最大化するため、このモデルはhttps://huggingface.co/malaysia-ai/malaysian-sfw-classifierで公開されています。

English

As large language models (LLMs) become increasingly integrated into operational workflows (LLM-Ops), there is a pressing need for effective guardrails to ensure safe and aligned interactions, including the ability to detect potentially unsafe or inappropriate content across languages. However, existing safe-for-work classifiers are primarily focused on English text. To address this gap for the Malaysian language, we present a novel safe-for-work text classifier tailored specifically for Malaysian language content. By curating and annotating a first-of-its-kind dataset of Malaysian text spanning multiple content categories, we trained a classification model capable of identifying potentially unsafe material using state-of-the-art natural language processing techniques. This work represents an important step in enabling safer interactions and content filtering to mitigate potential risks and ensure responsible deployment of LLMs. To maximize accessibility and promote further research towards enhancing alignment in LLM-Ops for the Malaysian context, the model is publicly released at https://huggingface.co/malaysia-ai/malaysian-sfw-classifier.

マレーシア語テキスト向けSafe-for-Work分類器の適応：LLM-Opsフレームワークにおける整合性の向上

Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework

要旨

Support