低リソース機械翻訳をパーソナライズド・フェデレーテッド・ラーニングの観点から考察する

要旨

本研究では、異種データを扱う自然言語処理タスクに適用可能なPersonalized Federated Learningアルゴリズム「MeritFed」に基づく新たなアプローチを提案します。この手法を、低リソース機械翻訳タスクにおいて評価しました。評価には、大規模多言語機械翻訳共有タスク（Small Track #2）のデータセットと、フィン・ウゴル語族の多言語ベンチマークからサミ語のサブセットを使用しました。MeritFedはその有効性に加えて、各訓練用言語の影響を追跡可能な高い解釈性を備えています。分析の結果、ターゲットデータセットのサイズが補助言語間の重み分布に影響を与えること、無関係な言語が訓練を妨げないこと、補助オプティマイザのパラメータが最小限の影響しか及ぼさないことが明らかになりました。本手法は数行のコードで容易に適用可能であり、実験の再現用スクリプトをhttps://github.com/VityaVitalich/MeritFedで公開しています。

English

We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the dataset from the Large-Scale Multilingual Machine Translation Shared Task (Small Track #2) and the subset of Sami languages from the multilingual benchmark for Finno-Ugric languages. In addition to its effectiveness, MeritFed is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments at https://github.com/VityaVitalich/MeritFed

低リソース機械翻訳をパーソナライズド・フェデレーテッド・ラーニングの観点から考察する

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

要旨

Support