저자원 기계 번역을 개인화된 연합 학습의 관점에서 탐구하기

초록

우리는 이기종 데이터를 활용한 자연어 처리 작업에 적용 가능한 개인화된 연합 학습 알고리즘인 MeritFed을 기반으로 한 새로운 접근 방식을 제안합니다. 이를 저자원 기계 번역 작업에 적용하여 평가하였으며, 대규모 다국어 기계 번역 공유 과제(Small Track #2)의 데이터셋과 핀우그릭어군 벤치마크의 사미어 하위 집합을 사용했습니다. MeritFed은 효과적일 뿐만 아니라, 학습에 사용된 각 언어의 영향을 추적할 수 있어 높은 해석 가능성을 제공합니다. 우리의 분석 결과, 타겟 데이터셋의 크기가 보조 언어 간의 가중치 분포에 영향을 미치며, 관련 없는 언어는 학습에 간섭을 주지 않고, 보조 최적화 매개변수는 최소한의 영향만을 미친다는 것을 확인했습니다. 이 접근 방식은 몇 줄의 코드로 쉽게 적용할 수 있으며, 실험 재현을 위한 스크립트를 https://github.com/VityaVitalich/MeritFed에서 제공합니다.

English

We present a new approach based on the Personalized Federated Learning algorithm MeritFed that can be applied to Natural Language Tasks with heterogeneous data. We evaluate it on the Low-Resource Machine Translation task, using the dataset from the Large-Scale Multilingual Machine Translation Shared Task (Small Track #2) and the subset of Sami languages from the multilingual benchmark for Finno-Ugric languages. In addition to its effectiveness, MeritFed is also highly interpretable, as it can be applied to track the impact of each language used for training. Our analysis reveals that target dataset size affects weight distribution across auxiliary languages, that unrelated languages do not interfere with the training, and auxiliary optimizer parameters have minimal impact. Our approach is easy to apply with a few lines of code, and we provide scripts for reproducing the experiments at https://github.com/VityaVitalich/MeritFed

저자원 기계 번역을 개인화된 연합 학습의 관점에서 탐구하기

Low-Resource Machine Translation through the Lens of Personalized Federated Learning

초록

Support