with Large Language Models 작은 언어 모델의 텍스트 임베딩 성능 향상을 위한 대규모 언어 모델 기반 대조적 미세 조정

초록

대규모 언어 모델(Large Language Models)은 자연어 이해 분야에서 뛰어난 성능을 보이지만, 리소스 집약적인 특성 때문에 접근성이 떨어지는 문제가 있습니다. 반면, MiniCPM과 같은 소규모 언어 모델은 더 지속 가능한 확장성을 제공하지만, 특화된 최적화 없이는 종종 성능이 떨어지는 경향이 있습니다. 본 논문에서는 소규모 언어 모델의 텍스트 임베딩 품질을 개선함으로써 이들의 성능을 향상시키는 방법을 탐구합니다. MiniCPM, Phi-2, Gemma 세 가지 언어 모델을 선택하여 NLI 데이터셋에 대해 대조적 미세 조정(contrastive fine-tuning)을 수행했습니다. 실험 결과, 이 미세 조정 방법은 다양한 벤치마크에서 세 모델 모두의 텍스트 임베딩 품질을 향상시켰으며, 특히 MiniCPM은 평균 56.33%의 성능 향상을 보였습니다. 대조적 미세 조정 코드는 https://github.com/trapoom555/Language-Model-STS-CFT에서 공개되어 있습니다.

English

While Large Language Models show remarkable performance in natural language understanding, their resource-intensive nature makes them less accessible. In contrast, smaller language models such as MiniCPM offer more sustainable scalability, but often underperform without specialized optimization. In this paper, we explore the enhancement of smaller language models through the improvement of their text embeddings. We select three language models, MiniCPM, Phi-2, and Gemma, to conduct contrastive fine-tuning on the NLI dataset. Our results demonstrate that this fine-tuning method enhances the quality of text embeddings for all three models across various benchmarks, with MiniCPM showing the most significant improvements of an average 56.33\% performance gain. The contrastive fine-tuning code is publicly available at https://github.com/trapoom555/Language-Model-STS-CFT.

with Large Language Models 작은 언어 모델의 텍스트 임베딩 성능 향상을 위한 대규모 언어 모델 기반 대조적 미세 조정

Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

초록

Support