通过个性化联邦学习视角看低资源机器翻译
Low-Resource Machine Translation through the Lens of Personalized Federated Learning
June 18, 2024
作者: Viktor Moskvoretskii, Nazarii Tupitsa, Chris Biemann, Samuel Horváth, Eduard Gorbunov, Irina Nikishina
cs.AI
摘要
我们提出了一种基于个性化联邦学习算法 MeritFed 的新方法,可应用于具有异构数据的自然语言任务。我们在低资源机器翻译任务上对其进行评估,使用了大规模多语言机器翻译共享任务(小轨道#2)的数据集以及芬乌戈尔语族多语言基准测试中萨米语言的子集。除了其有效性外,MeritFed 还具有很高的可解释性,因为它可以用来跟踪每种用于训练的语言的影响。我们的分析表明,目标数据集大小会影响辅助语言之间的权重分配,不相关的语言不会干扰训练,并且辅助优化器参数的影响很小。我们的方法易于应用,只需几行代码,我们提供了用于重现实验的脚本,网址为 https://github.com/VityaVitalich/MeritFed
English
We present a new approach based on the Personalized Federated Learning
algorithm MeritFed that can be applied to Natural Language Tasks with
heterogeneous data. We evaluate it on the Low-Resource Machine Translation
task, using the dataset from the Large-Scale Multilingual Machine Translation
Shared Task (Small Track #2) and the subset of Sami languages from the
multilingual benchmark for Finno-Ugric languages. In addition to its
effectiveness, MeritFed is also highly interpretable, as it can be applied to
track the impact of each language used for training. Our analysis reveals that
target dataset size affects weight distribution across auxiliary languages,
that unrelated languages do not interfere with the training, and auxiliary
optimizer parameters have minimal impact. Our approach is easy to apply with a
few lines of code, and we provide scripts for reproducing the experiments at
https://github.com/VityaVitalich/MeritFedSummary
AI-Generated Summary