언어 주문형, 지식 핵심형: 확장 가능한 다국어 처리를 위한 인코더-디코더 번역 모델과 LLM의 결합

초록

대규모 언어 모델(LLM)은 강력한 일반 지능을 보여주지만, 다국어 성능은 여전히 심각한 불균형을 보입니다. LLM이 통합 의미 공간에 상당한 양의 교차 언어 지식을 인코딩하지만, 이 지식을 저자원 언어나 학습되지 않은 언어와 안정적으로 연계하는 데는 종종 어려움을 겪습니다. 다행히 사전 학습된 인코더-디코더 번역 모델은 이미 균형 잡힌 다국어 능력을 보유하고 있어 LLM에 자연스러운 보완재가 될 수 있습니다. 본 연구에서는 다국어 이해와 생성을 외부의 사전 학습된 번역 모델에 위임하면서, 일반 지식 처리를 위한 영어 중심 코어로서 LLM의 기능을 보존하는 구성적 인코더-LLM-디코더 아키텍처인 XBridge를 제안합니다. 이로 인해 발생하는 모델 간 표현 불일치 문제를 해결하기 위해 경량의 교차 모델 매핑 레이어와 최적 수송 기반 정렬 목표를 도입하여 다국어 생성 시 세밀한 의미 일관성을 확보합니다. 다국어 이해, 추론, 요약, 생성 작업에 대해 4가지 LLM으로 진행한 실험 결과, XBridge가 LLM 재학습 없이도 강력한 기준 모델들을 능가하며, 특히 저자원 언어와 학습되지 않은 언어에서 우수한 성능을 보여주었습니다.

English

Large language models (LLMs) exhibit strong general intelligence, yet their multilingual performance remains highly imbalanced. Although LLMs encode substantial cross-lingual knowledge in a unified semantic space, they often struggle to reliably interface this knowledge with low-resource or unseen languages. Fortunately, pretrained encoder-decoder translation models already possess balanced multilingual capability, suggesting a natural complement to LLMs. In this work, we propose XBridge, a compositional encoder-LLM-decoder architecture that offloads multilingual understanding and generation to external pretrained translation models, while preserving the LLM as an English-centric core for general knowledge processing. To address the resulting representation misalignment across models, we introduce lightweight cross-model mapping layers and an optimal transport-based alignment objective, enabling fine-grained semantic consistency for multilingual generation. Experiments on four LLMs across multilingual understanding, reasoning, summarization, and generation indicate that XBridge outperforms strong baselines, especially on low-resource and previously unseen languages, without retraining the LLM.

언어 주문형, 지식 핵심형: 확장 가능한 다국어 처리를 위한 인코더-디코더 번역 모델과 LLM의 결합

Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

초록

Support