당신의 LLM이 컨텍스트를 완전히 활용하도록 하라

초록

현대의 많은 대형 언어 모델(LLM)은 긴 입력을 처리할 수 있지만, 여전히 긴 문맥 내 정보를 완전히 활용하는 데 어려움을 겪고 있으며, 이를 '중간에서 길을 잃는 문제(lost-in-the-middle challenge)'라고 부릅니다. 우리는 이 문제가 긴 문맥 훈련 중 명시적 지도가 충분하지 않아, 긴 문맥의 어느 위치든 중요한 정보를 담고 있을 수 있다는 점을 강조하지 못한 데서 비롯되었다고 가정합니다. 이러한 직관을 바탕으로, 우리 연구는 '정보 집중적(IN2) 훈련'을 제안하며, 이는 순수하게 데이터 기반의 솔루션으로 중간에서 길을 잃는 문제를 극복하기 위한 것입니다. 구체적으로, IN2 훈련은 합성된 긴 문맥(4K-32K 토큰) 내에서 짧은 세그먼트(~128 토큰)에 대한 세밀한 정보 인식과, 두 개 이상의 짧은 세그먼트에서 정보를 통합하고 추론하는 능력을 요구하는 질문-답변 데이터셋을 활용합니다. Mistral-7B에 이 정보 집중적 훈련을 적용하여 FILM-7B(FILl-in-the-Middle)를 개발했습니다. FILM-7B의 긴 문맥 활용 능력을 철저히 평가하기 위해, 다양한 문맥 스타일(문서, 코드, 구조화된 데이터 문맥)과 정보 검색 패턴(순방향, 역방향, 양방향 검색)을 포함한 세 가지 프로빙 작업을 설계했습니다. 프로빙 결과는 FILM-7B가 32K 문맥 창 내 다양한 위치에서 정보를 견고하게 검색할 수 있음을 보여줍니다. 이러한 프로빙 작업을 넘어, FILM-7B는 실제 긴 문맥 작업(예: NarrativeQA에서 23.5->26.9 F1 점수)에서 성능을 크게 향상시키면서도, 짧은 문맥 작업(예: MMLU에서 59.3->59.2 정확도)에서도 비슷한 성능을 유지합니다. Github 링크: https://github.com/microsoft/FILM.

English

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.

당신의 LLM이 컨텍스트를 완전히 활용하도록 하라

Make Your LLM Fully Utilize the Context

초록

Support