LACoS-BLOOM : Adaptation à faible rang avec objectif contrastif sur 8 bits Siamese-BLOOM

Résumé

Les embeddings de texte sont des caractéristiques utiles pour plusieurs applications de TAL, telles que la similarité de phrases, le clustering de texte et la recherche sémantique. Dans cet article, nous présentons une adaptation de faible rang avec un objectif contrastif sur un modèle Siamese-BLOOM 8 bits, un grand modèle de langage multilingue optimisé pour produire des embeddings de mots sémantiquement significatifs. L'innovation est triple. Premièrement, nous convertissons les poids de BLOOM en valeurs 8 bits. Deuxièmement, nous affinons BLOOM avec un adaptateur évolutif (LoRA) et un optimiseur Adam 8 bits pour la classification de similarité de phrases. Troisièmement, nous appliquons une architecture Siamese sur le modèle BLOOM avec un objectif contrastif pour atténuer la pénurie de données étiquetées multilingues. Les résultats expérimentaux montrent que la qualité des embeddings appris par LACoS-BLOOM est proportionnelle au nombre de paramètres du modèle et à la quantité de données d'entraînement non étiquetées. Grâce à la conception efficace en paramètres de l'affinage, nous pouvons exécuter BLOOM avec 7,1 milliards de paramètres de bout en bout sur une seule machine GPU avec 32 Go de mémoire. Par rapport à la solution précédente Sentence-BERT, nous obtenons une amélioration significative sur les tâches STS en anglais et multilingues.

English

Text embeddings are useful features for several NLP applications, such as sentence similarity, text clustering, and semantic search. In this paper, we present a Low-rank Adaptation with a Contrastive objective on top of 8-bit Siamese-BLOOM, a multilingual large language model optimized to produce semantically meaningful word embeddings. The innovation is threefold. First, we cast BLOOM weights to 8-bit values. Second, we fine-tune BLOOM with a scalable adapter (LoRA) and 8-bit Adam optimizer for sentence similarity classification. Third, we apply a Siamese architecture on BLOOM model with a contrastive objective to ease the multi-lingual labeled data scarcity. The experiment results show the quality of learned embeddings from LACoS-BLOOM is proportional to the number of model parameters and the amount of unlabeled training data. With the parameter efficient fine-tuning design, we are able to run BLOOM 7.1 billion parameters end-to-end on a single GPU machine with 32GB memory. Compared to previous solution Sentence-BERT, we achieve significant improvement on both English and multi-lingual STS tasks.

LACoS-BLOOM : Adaptation à faible rang avec objectif contrastif sur 8 bits Siamese-BLOOM

LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM

Résumé

Support