LACoS-BLOOM:低秩自适应对比目标在8位上 Siamese-BLOOM
LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM
May 10, 2023
作者: Wen-Yu Hua, Brian Williams, Davood Shamsi
cs.AI
摘要
文本嵌入是几种自然语言处理应用中的有用特征,例如句子相似度、文本聚类和语义搜索。本文介绍了一种低秩适应方法,该方法在8位Siamese-BLOOM之上采用对比目标,这是一个针对生成语义有意义的词嵌入进行优化的多语言大型语言模型。创新点有三。首先,我们将BLOOM权重转换为8位值。其次,我们使用可扩展的适配器(LoRA)和8位Adam优化器对BLOOM进行微调,用于句子相似度分类。第三,我们在BLOOM模型上应用Siamese架构,采用对比目标来缓解多语言标记数据的稀缺性。实验结果表明,从LACoS-BLOOM学到的嵌入质量与模型参数数量和未标记训练数据量成正比。通过参数高效微调设计,我们能够在单个GPU机器上以32GB内存端到端地运行具有71亿参数的BLOOM。与之前的解决方案Sentence-BERT相比,我们在英语和多语言STS任务上都取得了显著改进。
English
Text embeddings are useful features for several NLP applications, such as
sentence similarity, text clustering, and semantic search. In this paper, we
present a Low-rank Adaptation with a Contrastive objective on top of 8-bit
Siamese-BLOOM, a multilingual large language model optimized to produce
semantically meaningful word embeddings. The innovation is threefold. First, we
cast BLOOM weights to 8-bit values. Second, we fine-tune BLOOM with a scalable
adapter (LoRA) and 8-bit Adam optimizer for sentence similarity classification.
Third, we apply a Siamese architecture on BLOOM model with a contrastive
objective to ease the multi-lingual labeled data scarcity. The experiment
results show the quality of learned embeddings from LACoS-BLOOM is proportional
to the number of model parameters and the amount of unlabeled training data.
With the parameter efficient fine-tuning design, we are able to run BLOOM 7.1
billion parameters end-to-end on a single GPU machine with 32GB memory.
Compared to previous solution Sentence-BERT, we achieve significant improvement
on both English and multi-lingual STS tasks.