LACoS-BLOOM: 8비트에서 대조적 목적 함수를 활용한 저순위 적응 기법 샴-BLOOM

초록

텍스트 임베딩은 문장 유사성, 텍스트 클러스터링, 의미론적 검색과 같은 여러 NLP 애플리케이션에 유용한 특징입니다. 본 논문에서는 다국어 대형 언어 모델인 8비트 Siamese-BLOOM 위에 대조적 목적 함수를 적용한 저순위 적응(Low-rank Adaptation) 기법을 제안합니다. 이 모델은 의미론적으로 의미 있는 단어 임베딩을 생성하도록 최적화되었습니다. 이 연구의 혁신은 세 가지입니다. 첫째, BLOOM 가중치를 8비트 값으로 변환합니다. 둘째, 확장 가능한 어댑터(LoRA)와 8비트 Adam 옵티마이저를 사용하여 BLOOM을 문장 유사성 분류를 위해 미세 조정합니다. 셋째, 다국어 레이블 데이터 부족 문제를 완화하기 위해 BLOOM 모델에 샴(Siamese) 아키텍처와 대조적 목적 함수를 적용합니다. 실험 결과는 LACoS-BLOOM에서 학습된 임베딩의 품질이 모델 매개변수의 수와 레이블이 없는 훈련 데이터의 양에 비례함을 보여줍니다. 매개변수 효율적 미세 조정 설계를 통해 71억 개의 매개변수를 가진 BLOOM을 32GB 메모리의 단일 GPU 머신에서 종단 간 실행할 수 있습니다. 이전 솔루션인 Sentence-BERT와 비교하여, 우리는 영어 및 다국어 STS 작업에서 모두 상당한 개선을 달성했습니다.

English

Text embeddings are useful features for several NLP applications, such as sentence similarity, text clustering, and semantic search. In this paper, we present a Low-rank Adaptation with a Contrastive objective on top of 8-bit Siamese-BLOOM, a multilingual large language model optimized to produce semantically meaningful word embeddings. The innovation is threefold. First, we cast BLOOM weights to 8-bit values. Second, we fine-tune BLOOM with a scalable adapter (LoRA) and 8-bit Adam optimizer for sentence similarity classification. Third, we apply a Siamese architecture on BLOOM model with a contrastive objective to ease the multi-lingual labeled data scarcity. The experiment results show the quality of learned embeddings from LACoS-BLOOM is proportional to the number of model parameters and the amount of unlabeled training data. With the parameter efficient fine-tuning design, we are able to run BLOOM 7.1 billion parameters end-to-end on a single GPU machine with 32GB memory. Compared to previous solution Sentence-BERT, we achieve significant improvement on both English and multi-lingual STS tasks.

LACoS-BLOOM: 8비트에서 대조적 목적 함수를 활용한 저순위 적응 기법 샴-BLOOM

LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM

초록

Support