利用對比微調來改善較小語言模型的文本嵌入

摘要

儘管大型語言模型在自然語言理解方面表現出色，但其資源密集型的特性使其較不易取得。相較之下，較小的語言模型如MiniCPM提供更可持續的擴展性，但通常在沒有專門優化的情況下表現不佳。本文探討通過改進其文本嵌入來增強較小語言模型的方法。我們選擇了三個語言模型，MiniCPM、Phi-2和Gemma，在NLI數據集上進行對比微調。我們的結果表明，這種微調方法提升了三個模型的文本嵌入質量，在各種基準測試中均有顯著改善，其中MiniCPM表現出平均56.33\%的性能增益。對比微調的程式碼可在https://github.com/trapoom555/Language-Model-STS-CFT 公開獲取。

English

While Large Language Models show remarkable performance in natural language understanding, their resource-intensive nature makes them less accessible. In contrast, smaller language models such as MiniCPM offer more sustainable scalability, but often underperform without specialized optimization. In this paper, we explore the enhancement of smaller language models through the improvement of their text embeddings. We select three language models, MiniCPM, Phi-2, and Gemma, to conduct contrastive fine-tuning on the NLI dataset. Our results demonstrate that this fine-tuning method enhances the quality of text embeddings for all three models across various benchmarks, with MiniCPM showing the most significant improvements of an average 56.33\% performance gain. The contrastive fine-tuning code is publicly available at https://github.com/trapoom555/Language-Model-STS-CFT.

利用對比微調來改善較小語言模型的文本嵌入

Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

摘要

Support