ChatPaper.aiChatPaper

Piccolo2:具有多任務混合損失訓練的通用文本嵌入

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

May 11, 2024
作者: Junqin Huang, Zhongjie Hu, Zihao Jing, Mengya Gao, Yichao Wu
cs.AI

摘要

在本報告中,我們介紹了 Piccolo2,一個在 CMTEB 基準測試的 6 項任務中超越其他模型的嵌入模型,創立了新的最先進技術。Piccolo2 主要利用高效的多任務混合損失訓練方法,有效地利用來自不同下游任務的文本數據和標籤。此外,Piccolo2 擴展了嵌入維度並使用 MRL 訓練以支持更靈活的向量維度。有關 Piccolo 模型的最新信息可通過以下網址獲取:https://huggingface.co/sensenova/
English
In this report, we introduce Piccolo2, an embedding model that surpasses other models in the comprehensive evaluation over 6 tasks on CMTEB benchmark, setting a new state-of-the-art. Piccolo2 primarily leverages an efficient multi-task hybrid loss training approach, effectively harnessing textual data and labels from diverse downstream tasks. In addition, Piccolo2 scales up the embedding dimension and uses MRL training to support more flexible vector dimensions. The latest information of piccolo models can be accessed via: https://huggingface.co/sensenova/

Summary

AI-Generated Summary

PDF211December 15, 2024