AMBEDKAR——一种通过知识增强解码方法实现多层次偏差消除的框架,用于语言模型的稳健宪法对齐
AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models
September 2, 2025
作者: Snehasis Mukhopadhyay, Aryan Kasat, Shivam Dubey, Rahul Karthikeyan, Dhruv Sood, Vinija Jain, Aman Chadha, Amitava Das
cs.AI
摘要
大型语言模型(LLMs)在无意中可能反映出其训练数据中蕴含的社会偏见,导致产生有害或带有偏见的输出。在印度背景下,我们通过一系列模型的实证评估发现,围绕种姓和宗教的偏见尤为突出。然而,现有的大多数缓解策略以西方为中心,未能触及这些本土细微差别。我们提出了“AMBEDKAR”框架,该框架受印度宪法设计师B.R.安贝德卡博士平等愿景的启发,旨在引导LLM输出符合印度宪法第14至17条所倡导的公平、中立与包容性。我们的方法引入了一个“宪法感知解码层”,该层由印度AI宪法指导,仅在推理时应用,无需对基础模型进行参数更新。我们整合了一种推测性解码算法,在生成过程中主动减少种姓主义和社群偏见。这一缓解层直接作用于解码过程,避免了修改模型内部结构,并降低了重新训练带来的计算和基础设施成本。我们将推测性解码重新诠释为一种公平性机制,而不仅仅是效率工具。在此框架下,小型语言模型(SLM)扮演潜在偏见生成者的角色,而受宪法指导的大型语言模型(LLM)则作为验证者。LLM不再加速生成,而是确保SLM输出遵循抗偏见的轨迹。这种角色反转催生了一种“通过推测实现公平”的范式。与基线相比,我们的方法实现了高达26.41%的偏见绝对减少。我们的源代码、数据集及结果可在https://anonymous.4open.science/r/AMBEDKAR-983B/获取。
English
Large Language Models (LLMs) can inadvertently reflect societal biases
present in their training data, leading to harmful or prejudiced outputs. In
the Indian context, our empirical evaluations across a suite of models reveal
that biases around caste and religion are particularly salient. Yet, most
existing mitigation strategies are Western-centric and fail to address these
local nuances. We propose AMBEDKAR, a framework inspired by the egalitarian
vision of Dr B. R. Ambedkar, architect of the Indian Constitution, to guide LLM
outputs toward fairness, neutrality, and inclusion in line with Articles 14 to
17. Our approach introduces a Constitution-Aware Decoding Layer, guided by the
AI Constitution of India and applied only at inference time, without any
parameter updates to the base model. We incorporate a speculative decoding
algorithm that proactively reduces casteist and communal bias during
generation. This mitigation layer operates directly within the decoding
process, avoiding changes to model internals and lowering the computational and
infrastructural costs associated with retraining. We reinterpret speculative
decoding not merely as an efficiency tool but as a mechanism for fairness. In
this framework, a Small Language Model (SLM) acts as a potentially biased
generator, while a constitutionally guided Large Language Model (LLM) serves as
the verifier. Rather than accelerating generation, the LLM enforces bias-robust
trajectories in the SLM outputs. This inversion of roles gives rise to a
fairness-by-speculation paradigm. Our approach yields an absolute reduction of
bias up to 26.41 percent compared to baseline. Our source code, datasets, and
results are available at https://anonymous.4open.science/r/AMBEDKAR-983B/