テキスト検索モデルのドメイン適応のための影響力誘導サンプリング

要旨

汎用オープンドメイン高密度検索システムは、通常、多様なコーパスと検索タスクの大規模な混合データで学習されます。これらの多様なコーパスとタスクを学習用にどのようにサンプリングすべきでしょうか？従来のアプローチでは、インスタンス数の規模に比例した均一サンプリング、または人間レベルの専門家による監督に依存していました。学習データのサンプリング戦略がモデル性能に大きく影響することは周知の事実ですが、埋め込みモデルの文脈において最適な戦略を見つける方法は十分に研究されていません。本研究では、Inf-DDSという新しい強化学習駆動型サンプリングフレームワークを提案します。この枠組みは、影響力ベースの報酬信号に導かれて学習データセットの重みを適応的に再調整し、GPU消費量の点ではるかに軽量です。私たちの技術は、ターゲット開発セットにおけるモデル性能を最大化するデータセットを優先的に選択し、サンプリングポリシーを反復的に改良します。テキスト検索タスクの広範な実験により、従来の勾配ベースのサンプリング手法と比較して、検索性能の大幅な向上とより優れた適応性を実証しつつ、GPU計算コストを1.5倍から4倍削減することに成功しました。大規模な学習データセット群において専門家が割り当てた重みから学習を開始した場合でも、多言語bge-m3モデルの学習ではNDCG@10で5.03ポイントの絶対改善を、all-MiniLM-L6-v2の学習ではNDCG@10で0.94ポイントの絶対改善を達成しました。

English

General-purpose open-domain dense retrieval systems are usually trained with a large, eclectic mix of corpora and search tasks. How should these diverse corpora and tasks be sampled for training? Conventional approaches sample them uniformly, proportional to their instance population sizes, or depend on human-level expert supervision. It is well known that the training data sampling strategy can greatly impact model performance. However, how to find the optimal strategy has not been adequately studied in the context of embedding models. We propose Inf-DDS, a novel reinforcement learning driven sampling framework that adaptively reweighs training datasets guided by influence-based reward signals and is much more lightweight with respect to GPU consumption. Our technique iteratively refines the sampling policy, prioritizing datasets that maximize model performance on a target development set. We evaluate the efficacy of our sampling strategy on a wide range of text retrieval tasks, demonstrating strong improvements in retrieval performance and better adaptation compared to existing gradient-based sampling methods, while also being 1.5x to 4x cheaper in GPU compute. Our sampling strategy achieves a 5.03 absolute NDCG@10 improvement while training a multilingual bge-m3 model and an absolute NDCG@10 improvement of 0.94 while training all-MiniLM-L6-v2, even when starting from expert-assigned weights on a large pool of training datasets.

テキスト検索モデルのドメイン適応のための影響力誘導サンプリング

Influence Guided Sampling for Domain Adaptation of Text Retrievers

要旨

Support