ChatPaper.aiChatPaper

LACoS-BLOOM:低秩適應對比目標在8位元上的Siamese-BLOOM

LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM

May 10, 2023
作者: Wen-Yu Hua, Brian Williams, Davood Shamsi
cs.AI

摘要

文本嵌入是幾個自然語言處理應用中有用的特徵,例如句子相似度、文本聚類和語義搜索。本文提出了一種低秩適應方法,該方法在8位元Siamese-BLOOM之上採用對比目標,這是一個多語言大型語言模型,經過優化以生成語義上有意義的詞嵌入。這項創新有三個方面。首先,我們將BLOOM權重轉換為8位元值。其次,我們使用可擴展的適配器(LoRA)和8位元Adam優化器對BLOOM進行微調,用於句子相似度分類。第三,我們在BLOOM模型上應用Siamese架構,並採用對比目標,以緩解多語言標記數據的稀缺性。實驗結果顯示,從LACoS-BLOOM學習的嵌入質量與模型參數數量和未標記訓練數據量成正比。通過參數高效微調設計,我們能夠在單個GPU機器上以32GB內存端對端運行擁有71億參數的BLOOM。與以前的解決方案Sentence-BERT相比,我們在英語和多語言STS任務上均取得了顯著改進。
English
Text embeddings are useful features for several NLP applications, such as sentence similarity, text clustering, and semantic search. In this paper, we present a Low-rank Adaptation with a Contrastive objective on top of 8-bit Siamese-BLOOM, a multilingual large language model optimized to produce semantically meaningful word embeddings. The innovation is threefold. First, we cast BLOOM weights to 8-bit values. Second, we fine-tune BLOOM with a scalable adapter (LoRA) and 8-bit Adam optimizer for sentence similarity classification. Third, we apply a Siamese architecture on BLOOM model with a contrastive objective to ease the multi-lingual labeled data scarcity. The experiment results show the quality of learned embeddings from LACoS-BLOOM is proportional to the number of model parameters and the amount of unlabeled training data. With the parameter efficient fine-tuning design, we are able to run BLOOM 7.1 billion parameters end-to-end on a single GPU machine with 32GB memory. Compared to previous solution Sentence-BERT, we achieve significant improvement on both English and multi-lingual STS tasks.
PDF10December 15, 2024