AMBEDKAR - Een Multi-level Bias Eliminatie via een Decodering Benadering met Kennisversterking voor Robuuste Constitutionele Afstemming van Taalmodellen

Samenvatting

Grote Taalmodellen (LLM's) kunnen onbedoeld maatschappelijke vooroordelen weerspiegelen die aanwezig zijn in hun trainingsdata, wat leidt tot schadelijke of bevooroordeelde uitkomsten. In de Indiase context laten onze empirische evaluaties over een reeks modellen zien dat vooroordelen rond kaste en religie bijzonder opvallend zijn. Toch zijn de meeste bestaande mitigatiestrategieën westers georiënteerd en slagen ze er niet in om deze lokale nuances aan te pakken. Wij stellen AMBEDKAR voor, een raamwerk geïnspireerd op de egalitaire visie van Dr. B. R. Ambedkar, architect van de Indiase Grondwet, om LLM-uitkomsten te sturen naar eerlijkheid, neutraliteit en inclusiviteit in lijn met artikelen 14 tot 17. Onze aanpak introduceert een Grondwetsbewuste Decoderingslaag, geleid door de AI Grondwet van India en alleen toegepast tijdens inferentie, zonder enige parameterupdates aan het basismodel. We integreren een speculatieve decoderingsalgoritme dat proactief kaste- en gemeenschapsvooroordelen vermindert tijdens de generatie. Deze mitigatielaag opereert direct binnen het decoderingsproces, waardoor wijzigingen aan de interne werking van het model worden vermeden en de rekenkundige en infrastructurele kosten die gepaard gaan met hertraining worden verlaagd. We herinterpreteren speculatieve decodering niet slechts als een efficiëntiehulpmiddel, maar als een mechanisme voor eerlijkheid. In dit raamwerk fungeert een Klein Taalmodel (SLM) als een potentieel bevooroordeelde generator, terwijl een grondwettelijk geleid Groot Taalmodel (LLM) dient als de verifier. In plaats van de generatie te versnellen, handhaaft het LLM vooroordeelbestendige trajecten in de SLM-uitkomsten. Deze omkering van rollen leidt tot een paradigma van eerlijkheid-door-speculatie. Onze aanpak resulteert in een absolute vermindering van vooroordelen tot 26,41 procent in vergelijking met de baseline. Onze broncode, datasets en resultaten zijn beschikbaar op https://anonymous.4open.science/r/AMBEDKAR-983B/.

English

Large Language Models (LLMs) can inadvertently reflect societal biases present in their training data, leading to harmful or prejudiced outputs. In the Indian context, our empirical evaluations across a suite of models reveal that biases around caste and religion are particularly salient. Yet, most existing mitigation strategies are Western-centric and fail to address these local nuances. We propose AMBEDKAR, a framework inspired by the egalitarian vision of Dr B. R. Ambedkar, architect of the Indian Constitution, to guide LLM outputs toward fairness, neutrality, and inclusion in line with Articles 14 to 17. Our approach introduces a Constitution-Aware Decoding Layer, guided by the AI Constitution of India and applied only at inference time, without any parameter updates to the base model. We incorporate a speculative decoding algorithm that proactively reduces casteist and communal bias during generation. This mitigation layer operates directly within the decoding process, avoiding changes to model internals and lowering the computational and infrastructural costs associated with retraining. We reinterpret speculative decoding not merely as an efficiency tool but as a mechanism for fairness. In this framework, a Small Language Model (SLM) acts as a potentially biased generator, while a constitutionally guided Large Language Model (LLM) serves as the verifier. Rather than accelerating generation, the LLM enforces bias-robust trajectories in the SLM outputs. This inversion of roles gives rise to a fairness-by-speculation paradigm. Our approach yields an absolute reduction of bias up to 26.41 percent compared to baseline. Our source code, datasets, and results are available at https://anonymous.4open.science/r/AMBEDKAR-983B/

AMBEDKAR - Een Multi-level Bias Eliminatie via een Decodering Benadering met Kennisversterking voor Robuuste Constitutionele Afstemming van Taalmodellen

AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models

Samenvatting

Support