Alignment Studio: 대형 언어 모델을 특정 컨텍스트 규정에 맞춰 정렬하기

초록

대규모 언어 모델의 정렬(alignment)은 일반적으로 모델 제공자가 다양한 사용 사례와 상황에서 공통적이거나 보편적으로 이해되는 행동을 추가하거나 제어하기 위해 수행됩니다. 이와 대조적으로, 본 논문에서는 애플리케이션 개발자가 특정 가치, 사회적 규범, 법률 및 기타 규정에 맞게 모델을 조정하고, 상황에 따라 잠재적으로 상충되는 요구 사항을 조율할 수 있는 접근 방식과 아키텍처를 제시합니다. 우리는 이러한 '정렬 스튜디오(Alignment Studio)' 아키텍처의 세 가지 주요 구성 요소인 프레이머(Framers), 인스트럭터(Instructors), 그리고 감사관(Auditors)을 제안하며, 이들이 협력하여 언어 모델의 행동을 제어하는 방식을 설명합니다. 이 접근법을 기업의 내부용 엔터프라이즈 챗봇을 해당 기업의 비즈니스 행동 지침에 맞게 정렬하는 실행 예시를 통해 설명합니다.

English

The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentially conflicting requirements in context. We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors that work in concert to control the behavior of a language model. We illustrate this approach with a running example of aligning a company's internal-facing enterprise chatbot to its business conduct guidelines.

Alignment Studio: 대형 언어 모델을 특정 컨텍스트 규정에 맞춰 정렬하기

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

초록

Support