Da Geração ao Julgamento: Oportunidades e Desafios do LLM-como-juiz

Resumo

A avaliação e a avaliação têm sido desafios críticos há muito tempo em inteligência artificial (IA) e processamento de linguagem natural (PLN). No entanto, os métodos tradicionais, sejam baseados em correspondência ou em incorporação, frequentemente não conseguem julgar atributos sutis e fornecer resultados satisfatórios. Avanços recentes em Modelos de Linguagem de Grande Escala (LLMs) inspiram o paradigma "LLM-como-juiz", onde os LLMs são aproveitados para realizar pontuações, classificações ou seleções em várias tarefas e aplicações. Este artigo fornece uma pesquisa abrangente sobre julgamento e avaliação baseados em LLM, oferecendo uma visão detalhada para avançar nesse campo emergente. Começamos fornecendo definições detalhadas de ambas as perspectivas de entrada e saída. Em seguida, introduzimos uma taxonomia abrangente para explorar o LLM-como-juiz a partir de três dimensões: o que julgar, como julgar e onde julgar. Por fim, compilamos benchmarks para avaliar o LLM-como-juiz e destacamos os principais desafios e direções promissoras, com o objetivo de fornecer insights valiosos e inspirar futuras pesquisas nessa área promissora. A lista de artigos e mais recursos sobre LLM-como-juiz pode ser encontrada em https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge e https://llm-as-a-judge.github.io.

English

Assessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). However, traditional methods, whether matching-based or embedding-based, often fall short of judging subtle attributes and delivering satisfactory results. Recent advancements in Large Language Models (LLMs) inspire the "LLM-as-a-judge" paradigm, where LLMs are leveraged to perform scoring, ranking, or selection across various tasks and applications. This paper provides a comprehensive survey of LLM-based judgment and assessment, offering an in-depth overview to advance this emerging field. We begin by giving detailed definitions from both input and output perspectives. Then we introduce a comprehensive taxonomy to explore LLM-as-a-judge from three dimensions: what to judge, how to judge and where to judge. Finally, we compile benchmarks for evaluating LLM-as-a-judge and highlight key challenges and promising directions, aiming to provide valuable insights and inspire future research in this promising research area. Paper list and more resources about LLM-as-a-judge can be found at https://github.com/llm-as-a-judge/Awesome-LLM-as-a-judge and https://llm-as-a-judge.github.io.

Da Geração ao Julgamento: Oportunidades e Desafios do LLM-como-juiz

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

Resumo

Support