ARC-인코더: 대규모 언어 모델을 위한 압축 텍스트 표현 학습

초록

최근 검색 증대 생성이나 사고 연쇄 추론과 같은 기술들은 더 긴 컨텍스트와 증가된 추론 비용을 초래했습니다. 컨텍스트 압축 기술은 이러한 비용을 줄일 수 있지만, 가장 효과적인 접근법들은 대상 모델의 미세 조정이나 아키텍처 수정까지 필요로 합니다. 이는 특정 목적으로 사용되지 않을 때 모델의 일반적인 능력을 저하시킬 수 있습니다. 본 연구에서는 대안적 접근법을 탐구합니다: 컨텍스트를 연속적 표현으로 압축하여 디코더 LLM의 토큰 임베딩을 대체하는 인코더입니다. 먼저, 인코더를 위한 훈련 전략과 아키텍처 선택에 대한 체계적인 연구를 수행합니다. 우리의 연구 결과는 ARC-Encoder로 명명된 Adaptable text Representations Compressor의 설계로 이어졌으며, 이는 텍스트 토큰보다 x배(일반적으로 x∈{4,8}) 적은 수의 연속적 표현을 출력합니다. 우리는 인스트럭트 디코더와 베이스 디코더 모두에서, 컨텍스트 내 학습부터 컨텍스트 윈도우 확장에 이르기까지 다양한 LLM 사용 시나리오에 걸쳐 ARC-Encoder를 평가합니다. 결과는 ARC-Encoder가 여러 벤치마크에서 최첨단 성능을 달성하면서 추론 시 계산 효율성을 향상시킴을 보여줍니다. 마지막으로, 우리의 모델이 여러 디코더에 동시에 적용될 수 있음을 입증하여, 단일 인코더가 서로 다른 디코더 LLM에 걸쳐 일반화될 수 있게 합니다. 이는 ARC-Encoder를 여러 LLM과 원활하게 작동하는 이식성 있는 인코더를 위한 유연하고 효율적인 솔루션으로 만듭니다. 우리는 훈련 코드를 https://github.com/kyutai-labs/ARC-Encoder 에 공개하며, 미세 조정 데이터셋과 사전 훈련된 모델은 https://huggingface.co/collections/kyutai/arc-encoders-68ee18787301407d60a57047 에서 이용 가능합니다.

English

Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches require fine-tuning the target model or even modifying its architecture. This can degrade its general abilities when not used for this specific purpose. Here we explore an alternative approach: an encoder that compresses the context into continuous representations which replace token embeddings in decoder LLMs. First, we perform a systematic study of training strategies and architecture choices for the encoder. Our findings led to the design of an Adaptable text Representations Compressor, named ARC-Encoder, which outputs x-times fewer continuous representations (typically x!in!{4,8}) than text tokens. We evaluate ARC-Encoder across a variety of LLM usage scenarios, ranging from in-context learning to context window extension, on both instruct and base decoders. Results show that ARC-Encoder achieves state-of-the-art performance on several benchmarks while improving computational efficiency at inference. Finally, we demonstrate that our models can be adapted to multiple decoders simultaneously, allowing a single encoder to generalize across different decoder LLMs. This makes ARC-Encoder a flexible and efficient solution for portable encoders that work seamlessly with multiple LLMs. We release a training code at https://github.com/kyutai-labs/ARC-Encoder , fine-tuning dataset and pretrained models are available at https://huggingface.co/collections/kyutai/arc-encoders-68ee18787301407d60a57047 .

ARC-인코더: 대규모 언어 모델을 위한 압축 텍스트 표현 학습

ARC-Encoder: learning compressed text representations for large language models

초록

Support