FinMTEB: 金融大規模テキスト埋め込みベンチマーク

要旨

埋め込みモデルは、様々な自然言語処理（NLP）アプリケーションにおいて情報の表現と検索において重要な役割を果たしています。大規模言語モデル（LLM）の最近の進展により、埋め込みモデルの性能はさらに向上しています。これらのモデルは一般的に汎用データセットでベンチマークされることが多いですが、実世界のアプリケーションではドメイン固有の評価が求められます。本研究では、金融ドメイン向けに設計されたMTEBの専門版であるFinance Massive Text Embedding Benchmark（FinMTEB）を紹介します。FinMTEBは、中国語と英語の両方で、金融ニュース記事、企業年次報告書、ESG報告書、規制文書、決算説明会の議事録など、多様なテキストタイプをカバーする7つのタスクにわたる64の金融ドメイン固有の埋め込みデータセットで構成されています。また、ペルソナベースのデータ合成手法を用いて、多様な金融埋め込みタスクをカバーするために訓練された金融適応モデル、FinPersona-E5を開発しました。FinPersona-E5を含む15の埋め込みモデルの広範な評価を通じて、以下の3つの主要な知見を示します：（1）汎用ベンチマークでの性能は金融ドメインタスクとの相関が限定的であること、（2）ドメイン適応モデルは一貫して汎用モデルを上回ること、（3）驚くべきことに、金融セマンティックテキスト類似性（STS）タスクでは、単純なBag-of-Words（BoW）アプローチが高度な密埋め込み手法を上回り、密埋め込み技術の現状の限界を浮き彫りにしています。本研究は、金融NLPアプリケーションのための堅牢な評価フレームワークを確立し、ドメイン固有の埋め込みモデルの開発に重要な洞察を提供します。

English

Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advances in large language models (LLMs) have further enhanced the performance of embedding models. While these models are often benchmarked on general-purpose datasets, real-world applications demand domain-specific evaluation. In this work, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks that cover diverse textual types in both Chinese and English, such as financial news articles, corporate annual reports, ESG reports, regulatory filings, and earnings call transcripts. We also develop a finance-adapted model, FinPersona-E5, using a persona-based data synthetic method to cover diverse financial embedding tasks for training. Through extensive evaluation of 15 embedding models, including FinPersona-E5, we show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words (BoW) approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity (STS) tasks, underscoring current limitations in dense embedding techniques. Our work establishes a robust evaluation framework for financial NLP applications and provides crucial insights for developing domain-specific embedding models.

FinMTEB: 金融大規模テキスト埋め込みベンチマーク

FinMTEB: Finance Massive Text Embedding Benchmark

要旨

Support