論非互動式動物溝通翻譯器的評估方法

摘要

假設你擁有一款鯨語至英語的AI翻譯器，如何驗證其是否有效運作？是否需要與動物互動，或是依賴如溫度等具象觀測數據？我們提供的理論與概念驗證實驗表明，對於足夠複雜的語言，互動乃至觀察可能並非必要。人們或許能夠僅憑翻譯器的英語輸出來評估其性能，這在安全性、倫理及成本方面具有潛在優勢。這是一種無需參考譯文的機器翻譯質量評估（MTQE）實例。關鍵挑戰在於識別「幻覺」，即那些流暢且看似合理的錯誤翻譯。我們提出採用逐段翻譯結合經典的自然語言處理（NLP）亂序測試來評估翻譯器。其核心思想是逐句翻譯動物交流，並評估所得翻譯在順序上比隨機排列更為合理的頻率。在數據稀缺的人類語言及構建語言上進行的概念驗證實驗，展示了這一評估方法的潛在效用。這些人類語言實驗僅用於在數據稀缺條件下驗證我們的無參考指標，發現其與基於參考譯文的標準評估高度相關，而參考譯文在我們的實驗中是可獲取的。我們還進行了理論分析，表明在學習翻譯的初期階段，互動可能既非必要也不高效。

English

If you had an AI Whale-to-English translator, how could you validate whether or not it is working? Does one need to interact with the animals or rely on grounded observations such as temperature? We provide theoretical and proof-of-concept experimental evidence suggesting that interaction and even observations may not be necessary for sufficiently complex languages. One may be able to evaluate translators solely by their English outputs, offering potential advantages in terms of safety, ethics, and cost. This is an instance of machine translation quality evaluation (MTQE) without any reference translations available. A key challenge is identifying ``hallucinations,'' false translations which may appear fluent and plausible. We propose using segment-by-segment translation together with the classic NLP shuffle test to evaluate translators. The idea is to translate animal communication, turn by turn, and evaluate how often the resulting translations make more sense in order than permuted. Proof-of-concept experiments on data-scarce human languages and constructed languages demonstrate the potential utility of this evaluation methodology. These human-language experiments serve solely to validate our reference-free metric under data scarcity. It is found to correlate highly with a standard evaluation based on reference translations, which are available in our experiments. We also perform a theoretical analysis suggesting that interaction may not be necessary nor efficient in the early stages of learning to translate.

論非互動式動物溝通翻譯器的評估方法

On Non-interactive Evaluation of Animal Communication Translators

摘要

Support