I GPT producono traduzioni meno letterali?

Abstract

I modelli linguistici di grandi dimensioni (LLM) come GPT-3 sono emersi come modelli linguistici generici in grado di affrontare numerosi compiti di generazione o comprensione del linguaggio naturale. Nel campo della traduzione automatica (MT), diversi lavori hanno esplorato meccanismi di prompting few-shot per ottenere traduzioni migliori da parte degli LLM. Tuttavia, c'è stata relativamente poca indagine su come tali traduzioni differiscano qualitativamente da quelle generate dai modelli standard di traduzione automatica neurale (NMT). In questo lavoro, analizziamo queste differenze in termini di letteralità delle traduzioni prodotte dai due sistemi. Utilizzando misure di letteralità che coinvolgono l'allineamento delle parole e la monotonicità, scopriamo che le traduzioni dall'inglese (E-X) generate dai GPT tendono a essere meno letterali, pur ottenendo punteggi simili o migliori nelle metriche di qualità della traduzione automatica. Dimostriamo che questo risultato è confermato anche nelle valutazioni umane. Successivamente, mostriamo che queste differenze sono particolarmente evidenti quando si traducono frasi che contengono espressioni idiomatiche.

English

Large Language Models (LLMs) such as GPT-3 have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs. However, there has been relatively little investigation on how such translations differ qualitatively from the translations generated by standard Neural Machine Translation (NMT) models. In this work, we investigate these differences in terms of the literalness of translations produced by the two systems. Using literalness measures involving word alignment and monotonicity, we find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on MT quality metrics. We demonstrate that this finding is borne out in human evaluations as well. We then show that these differences are especially pronounced when translating sentences that contain idiomatic expressions.

I GPT producono traduzioni meno letterali?

Do GPTs Produce Less Literal Translations?

Abstract

Support