DeCoRe: Decodificação por Contraste de Cabeças de Recuperação para Mitigar Alucinações

Resumo

Grandes Modelos de Linguagem (LLMs) frequentemente sofrem de alucinações, produzindo saídas não fiéis ou factualmente incorretas ao distorcer o contexto fornecido ou recordar incorretamente conhecimento interno. Estudos recentes identificaram cabeças de atenção específicas dentro da arquitetura Transformer, conhecidas como cabeças de recuperação, responsáveis por extrair informações contextuais relevantes. Nossa hipótese é que mascarar essas cabeças de recuperação pode induzir alucinações e que contrastar as saídas do LLM base e do LLM mascarado pode reduzir as alucinações. Para isso, propomos Decodificação por Contraste de Cabeças de Recuperação (DeCoRe), uma estratégia de decodificação inovadora sem treinamento que amplifica as informações encontradas no contexto e nos parâmetros do modelo. DeCoRe mitiga respostas potencialmente alucinadas contrastando dinamicamente as saídas do LLM base e do LLM mascarado, utilizando entropia condicional como guia. Nossos extensivos experimentos confirmam que DeCoRe melhora significativamente o desempenho em tarefas que exigem alta fidelidade contextual, como sumarização (XSum em 18,6%), seguimento de instruções (MemoTrap em 10,9%) e resposta a perguntas de livro aberto (NQ-Open em 2,4% e NQ-Swap em 5,5%).

English

Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within the Transformer architecture, known as retrieval heads, responsible for extracting relevant contextual information. We hypothesise that masking these retrieval heads can induce hallucinations and that contrasting the outputs of the base LLM and the masked LLM can reduce hallucinations. To this end, we propose Decoding by Contrasting Retrieval Heads (DeCoRe), a novel training-free decoding strategy that amplifies information found in the context and model parameters. DeCoRe mitigates potentially hallucinated responses by dynamically contrasting the outputs of the base LLM and the masked LLM, using conditional entropy as a guide. Our extensive experiments confirm that DeCoRe significantly improves performance on tasks requiring high contextual faithfulness, such as summarisation (XSum by 18.6%), instruction following (MemoTrap by 10.9%), and open-book question answering (NQ-Open by 2.4% and NQ-Swap by 5.5%).

DeCoRe: Decodificação por Contraste de Cabeças de Recuperação para Mitigar Alucinações

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations

Resumo

Support