AMBEDKAR - 언어 모델의 강건한 헌법적 정렬을 위한 지식 증강 디코딩 접근법을 통한 다단계 편향 제거

초록

대형 언어 모델(LLMs)은 학습 데이터에 내재된 사회적 편향을 의도치 않게 반영할 수 있으며, 이로 인해 유해하거나 편견에 찬 출력을 생성할 수 있다. 인도 맥락에서, 우리는 다양한 모델에 걸친 실증적 평가를 통해 카스트와 종교와 관련된 편향이 특히 두드러짐을 확인했다. 그러나 대부분의 기존 완화 전략은 서구 중심적이며 이러한 지역적 뉘앙스를 다루지 못한다. 우리는 인도 헌법의 설계자인 B. R. 암베드카르 박사의 평등주의적 비전에서 영감을 받은 AMBEDKAR 프레임워크를 제안하여, LLM 출력이 헌법 제14조부터 제17조에 부합하는 공정성, 중립성, 포용성을 지향하도록 유도한다. 우리의 접근 방식은 인도 AI 헌법에 의해 안내되는 헌법 인식 디코딩 레이어를 도입하며, 이는 추론 시에만 적용되어 기본 모델의 매개변수 업데이트 없이 작동한다. 우리는 카스트 및 공동체적 편향을 생성 과정에서 사전에 줄이는 스펙티브 디코딩 알고리즘을 통합한다. 이 완화 레이어는 디코딩 프로세스 내에서 직접 작동하여 모델 내부 변경을 피하고 재훈련과 관련된 계산 및 인프라 비용을 절감한다. 우리는 스펙티브 디코딩을 단순히 효율성 도구가 아닌 공정성을 위한 메커니즘으로 재해석한다. 이 프레임워크에서 소형 언어 모델(SLM)은 잠재적으로 편향된 생성기로 작동하며, 헌법적으로 안내된 대형 언어 모델(LLM)은 검증자 역할을 한다. LLM은 생성 속도를 높이는 대신 SLM 출력에서 편향에 강건한 궤적을 강제한다. 이러한 역할의 역전은 공정성-추측 패러다임을 탄생시킨다. 우리의 접근 방식은 기준선 대비 최대 26.41%의 편향 절대적 감소를 달성한다. 우리의 소스 코드, 데이터셋 및 결과는 https://anonymous.4open.science/r/AMBEDKAR-983B/에서 확인할 수 있다.

English

Large Language Models (LLMs) can inadvertently reflect societal biases present in their training data, leading to harmful or prejudiced outputs. In the Indian context, our empirical evaluations across a suite of models reveal that biases around caste and religion are particularly salient. Yet, most existing mitigation strategies are Western-centric and fail to address these local nuances. We propose AMBEDKAR, a framework inspired by the egalitarian vision of Dr B. R. Ambedkar, architect of the Indian Constitution, to guide LLM outputs toward fairness, neutrality, and inclusion in line with Articles 14 to 17. Our approach introduces a Constitution-Aware Decoding Layer, guided by the AI Constitution of India and applied only at inference time, without any parameter updates to the base model. We incorporate a speculative decoding algorithm that proactively reduces casteist and communal bias during generation. This mitigation layer operates directly within the decoding process, avoiding changes to model internals and lowering the computational and infrastructural costs associated with retraining. We reinterpret speculative decoding not merely as an efficiency tool but as a mechanism for fairness. In this framework, a Small Language Model (SLM) acts as a potentially biased generator, while a constitutionally guided Large Language Model (LLM) serves as the verifier. Rather than accelerating generation, the LLM enforces bias-robust trajectories in the SLM outputs. This inversion of roles gives rise to a fairness-by-speculation paradigm. Our approach yields an absolute reduction of bias up to 26.41 percent compared to baseline. Our source code, datasets, and results are available at https://anonymous.4open.science/r/AMBEDKAR-983B/

AMBEDKAR - 언어 모델의 강건한 헌법적 정렬을 위한 지식 증강 디코딩 접근법을 통한 다단계 편향 제거

AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models

초록

Support