ChatPaper.aiChatPaper

透過個性化聯邦學習觀點看低資源機器翻譯

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

June 18, 2024
作者: Viktor Moskvoretskii, Nazarii Tupitsa, Chris Biemann, Samuel Horváth, Eduard Gorbunov, Irina Nikishina
cs.AI

摘要

我們提出了一種基於個性化聯邦學習算法 MeritFed 的新方法,可應用於具異構數據的自然語言任務。我們在低資源機器翻譯任務上對其進行評估,使用了大規模多語言機器翻譯共享任務的數據集(小型軌道#2)以及芬諾-烏戈爾語言多語言基準中薩米語言的子集。除了其有效性外,MeritFed 還具有高度可解釋性,因為它可應用於追蹤每種用於訓練的語言的影響。我們的分析顯示,目標數據集大小會影響輔助語言之間的權重分佈,不相關的語言不會干擾訓練,而輔助優化器參數的影響很小。我們的方法易於應用,只需幾行代碼,並提供了用於重現實驗的腳本,網址為 https://github.com/VityaVitalich/MeritFed
English
We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the dataset from the Large-Scale Multilingual Machine Translation Shared Task (Small Track #2) and the subset of Sami languages from the multilingual benchmark for Finno-Ugric languages. In addition to its effectiveness, MeritFed is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments at https://github.com/VityaVitalich/MeritFed

Summary

AI-Generated Summary

PDF31November 29, 2024