利用对比微调来改进较小语言模型的文本嵌入
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning
August 1, 2024
作者: Trapoom Ukarapol, Zhicheng Lee, Amy Xin
cs.AI
摘要
尽管大型语言模型在自然语言理解方面表现出色,但其资源密集型特性使其较不易获得。相比之下,小型语言模型如MiniCPM提供更可持续的可扩展性,但通常在没有专门优化的情况下表现不佳。本文探讨了通过改进其文本嵌入来增强小型语言模型的方法。我们选择了三种语言模型,MiniCPM、Phi-2和Gemma,在NLI数据集上进行对比微调。我们的结果表明,这种微调方法提升了所有三种模型的文本嵌入质量,在各项基准测试中,MiniCPM表现出了平均56.33\%的性能提升。对比微调的代码可在https://github.com/trapoom555/Language-Model-STS-CFT 上公开获取。
English
While Large Language Models show remarkable performance in natural language
understanding, their resource-intensive nature makes them less accessible. In
contrast, smaller language models such as MiniCPM offer more sustainable
scalability, but often underperform without specialized optimization. In this
paper, we explore the enhancement of smaller language models through the
improvement of their text embeddings. We select three language models, MiniCPM,
Phi-2, and Gemma, to conduct contrastive fine-tuning on the NLI dataset. Our
results demonstrate that this fine-tuning method enhances the quality of text
embeddings for all three models across various benchmarks, with MiniCPM showing
the most significant improvements of an average 56.33\% performance gain. The
contrastive fine-tuning code is publicly available at
https://github.com/trapoom555/Language-Model-STS-CFT.Summary
AI-Generated Summary