Rank-without-GPT: 오픈소스 대형 언어 모델 기반의 GPT 독립적 리스트와이즈 리랭커 구축

초록

대규모 언어 모델(LLM) 기반의 리스트와이즈 리랭커(Listwise Reranker)는 제로샷(Zero-shot) 방식에서 최첨단 기술로 평가받고 있다. 그러나 현재 이 방향의 연구들은 모두 GPT 모델에 의존하고 있어 과학적 재현성 측면에서 단일 실패 지점으로 작용하고 있다. 더욱이, 현재의 연구 결과가 GPT 모델에만 적용되고 일반적인 LLM에는 해당되지 않을 수 있다는 우려를 제기한다. 본 연구에서는 이러한 전제 조건을 제거하고, GPT에 대한 어떠한 형태의 의존성 없이도 효과적인 리스트와이즈 리랭커를 최초로 구축하였다. 본문 검색 실험 결과, 우리의 최고 성능 리스트와이즈 리랭커는 GPT-3.5 기반 리랭커를 13% 능가하며, GPT-4 기반 리랭커의 97% 효과성을 달성하였다. 또한, 기존의 훈련 데이터셋이 포인트와이즈(Pointwise) 랭킹을 위해 명시적으로 구축된 것임을 확인하였고, 이러한 데이터셋이 리스트와이즈 리랭커 구축에는 부적합함을 보였다. 대신, 고품질의 리스트와이즈 랭킹 데이터가 필수적이며 중요하다는 점을 확인하였고, 이를 위해 인간 주석이 포함된 리스트와이즈 데이터 리소스 구축에 대한 추가 연구가 필요함을 제안한다.

English

Listwise rerankers based on large language models (LLM) are the zero-shot state-of-the-art. However, current works in this direction all depend on the GPT models, making it a single point of failure in scientific reproducibility. Moreover, it raises the concern that the current research findings only hold for GPT models but not LLM in general. In this work, we lift this pre-condition and build for the first time effective listwise rerankers without any form of dependency on GPT. Our passage retrieval experiments show that our best list se reranker surpasses the listwise rerankers based on GPT-3.5 by 13% and achieves 97% effectiveness of the ones built on GPT-4. Our results also show that the existing training datasets, which were expressly constructed for pointwise ranking, are insufficient for building such listwise rerankers. Instead, high-quality listwise ranking data is required and crucial, calling for further work on building human-annotated listwise data resources.

Rank-without-GPT: 오픈소스 대형 언어 모델 기반의 GPT 독립적 리스트와이즈 리랭커 구축

Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models

초록

Support