Piccolo2:多任务混合损失训练的通用文本嵌入
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training
May 11, 2024
作者: Junqin Huang, Zhongjie Hu, Zihao Jing, Mengya Gao, Yichao Wu
cs.AI
摘要
在本报告中,我们介绍了Piccolo2,这是一个嵌入模型,在CMTEB基准测试的6个任务上的综合评估中超越了其他模型,创造了新的技术水平。Piccolo2主要利用高效的多任务混合损失训练方法,有效地利用来自不同下游任务的文本数据和标签。此外,Piccolo2扩展了嵌入维度,并使用MRL训练来支持更灵活的向量维度。有关piccolo模型的最新信息,请访问:https://huggingface.co/sensenova/
English
In this report, we introduce Piccolo2, an embedding model that surpasses
other models in the comprehensive evaluation over 6 tasks on CMTEB benchmark,
setting a new state-of-the-art. Piccolo2 primarily leverages an efficient
multi-task hybrid loss training approach, effectively harnessing textual data
and labels from diverse downstream tasks. In addition, Piccolo2 scales up the
embedding dimension and uses MRL training to support more flexible vector
dimensions. The latest information of piccolo models can be accessed via:
https://huggingface.co/sensenova/Summary
AI-Generated Summary