말레이시아어 텍스트를 위한 안전한 작업 환경 분류기 적용: LLM-Ops 프레임워크의 정렬 강화

초록

대규모 언어 모델(LLM)이 운영 워크플로우(LLM-Ops)에 점점 더 통합됨에 따라, 안전하고 정렬된 상호작용을 보장하기 위한 효과적인 가드레일의 필요성이 절실해지고 있습니다. 이는 다양한 언어에서 잠재적으로 안전하지 않거나 부적절한 콘텐츠를 탐지할 수 있는 능력을 포함합니다. 그러나 기존의 안전한 작업용 분류기는 주로 영어 텍스트에 초점을 맞추고 있습니다. 이러한 격차를 해소하기 위해 말레이시아 언어에 특화된 새로운 안전한 작업용 텍스트 분류기를 제시합니다. 다양한 콘텐츠 카테고리에 걸친 말레이시아 텍스트의 최초 데이터셋을 수집하고 주석을 달아, 최신 자연어 처리 기술을 활용하여 잠재적으로 안전하지 않은 자료를 식별할 수 있는 분류 모델을 훈련했습니다. 이 작업은 LLM의 책임 있는 배포를 보장하고 잠재적 위험을 완화하기 위해 더 안전한 상호작용과 콘텐츠 필터링을 가능하게 하는 중요한 단계를 나타냅니다. 접근성을 극대화하고 말레이시아 맥락에서 LLM-Ops의 정렬을 강화하기 위한 추가 연구를 촉진하기 위해, 이 모델은 https://huggingface.co/malaysia-ai/malaysian-sfw-classifier에서 공개되었습니다.

English

As large language models (LLMs) become increasingly integrated into operational workflows (LLM-Ops), there is a pressing need for effective guardrails to ensure safe and aligned interactions, including the ability to detect potentially unsafe or inappropriate content across languages. However, existing safe-for-work classifiers are primarily focused on English text. To address this gap for the Malaysian language, we present a novel safe-for-work text classifier tailored specifically for Malaysian language content. By curating and annotating a first-of-its-kind dataset of Malaysian text spanning multiple content categories, we trained a classification model capable of identifying potentially unsafe material using state-of-the-art natural language processing techniques. This work represents an important step in enabling safer interactions and content filtering to mitigate potential risks and ensure responsible deployment of LLMs. To maximize accessibility and promote further research towards enhancing alignment in LLM-Ops for the Malaysian context, the model is publicly released at https://huggingface.co/malaysia-ai/malaysian-sfw-classifier.

말레이시아어 텍스트를 위한 안전한 작업 환경 분류기 적용: LLM-Ops 프레임워크의 정렬 강화

Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework

초록

Support