ChatPaper.aiChatPaper

FinMTEB:金融大規模文本嵌入基準

FinMTEB: Finance Massive Text Embedding Benchmark

February 16, 2025
作者: Yixuan Tang, Yi Yang
cs.AI

摘要

嵌入模型在各種自然語言處理(NLP)應用中扮演著關鍵角色,用於表示和檢索信息。近年來,大型語言模型(LLMs)的進展進一步提升了嵌入模型的性能。儘管這些模型通常在通用數據集上進行基準測試,但實際應用需要針對特定領域的評估。在本研究中,我們引入了金融大規模文本嵌入基準(FinMTEB),這是專為金融領域設計的MTEB對應版本。FinMTEB包含64個金融領域特定的嵌入數據集,涵蓋7項任務,涉及中英文多種文本類型,如金融新聞文章、公司年報、ESG報告、監管文件及財報電話會議記錄。我們還開發了一款金融適應模型FinPersona-E5,採用基於角色的數據合成方法,以涵蓋多樣化的金融嵌入任務進行訓練。通過對包括FinPersona-E5在內的15種嵌入模型進行廣泛評估,我們揭示了三個關鍵發現:(1) 在通用基準上的表現與金融領域任務的相關性有限;(2) 領域適應模型普遍優於其通用版本;(3) 令人驚訝的是,在金融語義文本相似性(STS)任務中,簡單的詞袋(BoW)方法竟超越了複雜的密集嵌入技術,這凸顯了當前密集嵌入技術的局限性。我們的工作為金融NLP應用建立了一個堅實的評估框架,並為開發領域特定的嵌入模型提供了重要見解。
English
Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advances in large language models (LLMs) have further enhanced the performance of embedding models. While these models are often benchmarked on general-purpose datasets, real-world applications demand domain-specific evaluation. In this work, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks that cover diverse textual types in both Chinese and English, such as financial news articles, corporate annual reports, ESG reports, regulatory filings, and earnings call transcripts. We also develop a finance-adapted model, FinPersona-E5, using a persona-based data synthetic method to cover diverse financial embedding tasks for training. Through extensive evaluation of 15 embedding models, including FinPersona-E5, we show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words (BoW) approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity (STS) tasks, underscoring current limitations in dense embedding techniques. Our work establishes a robust evaluation framework for financial NLP applications and provides crucial insights for developing domain-specific embedding models.

Summary

AI-Generated Summary

PDF32February 19, 2025