10M 건초더미 속 바늘 찾기: 반복 메모리가 LLM이 놓친 것을 찾아내다

초록

본 논문은 생성형 트랜스포머 모델을 사용하여 긴 문서를 처리하는 과제를 다룹니다. 다양한 접근 방식을 평가하기 위해, 우리는 BABILong이라는 새로운 벤치마크를 도입했습니다. 이 벤치마크는 광범위한 텍스트 내에서 분산된 사실을 추출하고 처리하는 모델의 능력을 평가하도록 설계되었습니다. GPT-4와 RAG를 포함한 벤치마크 평가 결과, 일반적인 방법들은 최대 10^4개의 요소를 가진 시퀀스에만 효과적인 것으로 나타났습니다. 반면, 반복적 메모리 증강을 통해 GPT-2를 미세 조정하면 최대 10^7개의 요소를 포함하는 작업을 처리할 수 있게 됩니다. 이는 지금까지 공개된 신경망 모델 중 가장 긴 입력을 처리한 것으로, 긴 시퀀스 처리 능력에서 상당한 도약을 이루었음을 보여줍니다.

English

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to 10^4 elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 10^7 elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.

10M 건초더미 속 바늘 찾기: 반복 메모리가 LLM이 놓친 것을 찾아내다

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

초록

Support