通过个性化联邦学习视角看低资源机器翻译

摘要

我们提出了一种基于个性化联邦学习算法 MeritFed 的新方法，可应用于具有异构数据的自然语言任务。我们在低资源机器翻译任务上对其进行评估，使用了大规模多语言机器翻译共享任务（小轨道＃2）的数据集以及芬乌戈尔语族多语言基准测试中萨米语言的子集。除了其有效性外，MeritFed 还具有很高的可解释性，因为它可以用来跟踪每种用于训练的语言的影响。我们的分析表明，目标数据集大小会影响辅助语言之间的权重分配，不相关的语言不会干扰训练，并且辅助优化器参数的影响很小。我们的方法易于应用，只需几行代码，我们提供了用于重现实验的脚本，网址为 https://github.com/VityaVitalich/MeritFed

English

We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the dataset from the Large-Scale Multilingual Machine Translation Shared Task (Small Track #2) and the subset of Sami languages from the multilingual benchmark for Finno-Ugric languages. In addition to its effectiveness, MeritFed is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments at https://github.com/VityaVitalich/MeritFed

通过个性化联邦学习视角看低资源机器翻译

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

摘要

Support