Uma História de Confiança e Precisão: LLMs Base vs. Instruct em Sistemas RAG

Resumo

A Geração com Recuperação Aprimorada (RAG) representa um avanço significativo em inteligência artificial, combinando uma fase de recuperação com uma fase generativa, sendo esta última normalmente impulsionada por grandes modelos de linguagem (LLMs). As práticas comuns atuais em RAG envolvem o uso de LLMs "instruídos", que são ajustados com treinamento supervisionado para aprimorar sua capacidade de seguir instruções e são alinhados com preferências humanas usando técnicas de ponta. Contrariando a crença popular, nosso estudo demonstra que os modelos base superam seus equivalentes instruídos em tarefas de RAG em média em 20% sob nossas configurações experimentais. Esse achado desafia as suposições predominantes sobre a superioridade dos LLMs instruídos em aplicações de RAG. Investigações adicionais revelam uma situação mais matizada, questionando aspectos fundamentais de RAG e sugerindo a necessidade de discussões mais amplas sobre o tema; ou, como Fromm diria, "Raramente um olhar para as estatísticas é suficiente para entender o significado dos números".

English

Retrieval Augmented Generation (RAG) represents a significant advancement in artificial intelligence combining a retrieval phase with a generative phase, with the latter typically being powered by large language models (LLMs). The current common practices in RAG involve using "instructed" LLMs, which are fine-tuned with supervised training to enhance their ability to follow instructions and are aligned with human preferences using state-of-the-art techniques. Contrary to popular belief, our study demonstrates that base models outperform their instructed counterparts in RAG tasks by 20% on average under our experimental settings. This finding challenges the prevailing assumptions about the superiority of instructed LLMs in RAG applications. Further investigations reveal a more nuanced situation, questioning fundamental aspects of RAG and suggesting the need for broader discussions on the topic; or, as Fromm would have it, "Seldom is a glance at the statistics enough to understand the meaning of the figures".

Uma História de Confiança e Precisão: LLMs Base vs. Instruct em Sistemas RAG

A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems

Resumo

Support