Lius：古邦マレー語における継続的指示チューニングを用いた教授言語学に基づく翻訳モデル

要旨

大規模言語モデル（LLM）は翻訳タスクにおいて新たな可能性を提供するが、低リソース言語を扱う際には性能が低下することが多い。この制限に対処するため、我々は低リソース言語であるクパン・マレー語に対してLLMをファインチューニングする手法を提案する。本手法では、バイリンガル辞書から明示的な語彙的特徴および意味的特徴を活用して一連のインストラクションを設計し、反復的なインストラクションに基づく訓練を可能にする訓練パラダイムである継続的インストラクションチューニング（CIT）を導入する。実験結果は、我々のモデルであるLiusが、標準的なインストラクションチューニングモデルを4〜6ポイント上回り、ニューラル機械翻訳（NMT）および多言語LLMモデルを複数の評価指標において10〜13ポイント上回る顕著な改善を示すことを実証している。これらの知見は、低リソース言語翻訳における大規模パラレルデータへの依存を軽減する本手法の可能性を強調するものである。

English

Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.