Lius：基于翻译模型的教学语言学——在古邦马来语中采用持续指令微调

摘要

大语言模型（LLMs）为翻译任务带来了新的潜力，但在处理低资源语言时往往会出现性能下降。为克服这一局限，我们提出了一种针对低资源语言——古邦马来语的LLM微调方法。该方法通过利用双语词典中的显式词汇和语义特征设计一组指令，并引入持续指令微调（CIT）这一训练范式，实现迭代式基于指令的训练。实验结果表明，我们名为Lius的模型在多项评测指标上较标准指令微调模型提升了4-6个百分点，且超越神经机器翻译（NMT）和多语言LLM模型10-13个百分点。这些发现凸显了该方法在低资源语言翻译中减少对大规模平行语料依赖的潜力。

English

Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.