Lius：基於翻譯模型的教學語言學方法在古邦馬來語中的持續指令微調

摘要

大型語言模型（LLMs）為翻譯任務帶來了新的潛力，但在處理低資源語言時常出現效能衰退。為解決此限制，我們提出一種針對低資源語言——古邦馬來語進行LLM微調的方法。我們的做法包含設計一套指令，透過利用雙語詞典中的顯式詞彙與語義特徵，並引入持續指令微調（CIT），這是一種可實現基於指令反覆訓練的學習範式。實驗結果顯示，我們命名為Lius的模型在多項評估指標上，比標準指令微調模型提升4至6個百分點，並超越神經機器翻譯（NMT）與多語言LLM模型達10至13個百分點。這些發現凸顯了我們方法在低資源語言翻譯中減少對大規模平行語料依賴的潛力。

English

Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.