关于动物交流翻译器的非交互式评估方法

摘要

若你拥有一款鲸语至英语的AI翻译器，该如何验证其是否有效？是否需要与动物互动，或是依赖于诸如温度等具象的观测数据？我们提供的理论与概念验证实验证据表明，对于足够复杂的语言，互动乃至观察或许并非必需。人们或许仅凭翻译器的英文输出就能对其作出评估，这为安全性、伦理考量及成本控制带来了潜在优势。这是无参考译文情况下机器翻译质量评估（MTQE）的一个实例。核心挑战在于识别“幻觉”，即那些看似流畅合理实则错误的翻译。我们建议采用逐段翻译结合经典的NLP随机排列测试来评估翻译器。其思路是将动物交流逐句翻译，并评估翻译结果在顺序上比随机排列时更合理的频率。在数据稀缺的人类语言及构造语言上进行的概念验证实验，展示了这一评估方法的潜在效用。这些人类语言实验仅用于在数据稀缺条件下验证我们的无参考指标。研究发现，该指标与基于参考译文的标准评估高度相关，而参考译文在我们的实验中是可获取的。我们还进行了理论分析，表明在学习翻译的初期阶段，互动可能既非必要也非高效。

English

If you had an AI Whale-to-English translator, how could you validate whether or not it is working? Does one need to interact with the animals or rely on grounded observations such as temperature? We provide theoretical and proof-of-concept experimental evidence suggesting that interaction and even observations may not be necessary for sufficiently complex languages. One may be able to evaluate translators solely by their English outputs, offering potential advantages in terms of safety, ethics, and cost. This is an instance of machine translation quality evaluation (MTQE) without any reference translations available. A key challenge is identifying ``hallucinations,'' false translations which may appear fluent and plausible. We propose using segment-by-segment translation together with the classic NLP shuffle test to evaluate translators. The idea is to translate animal communication, turn by turn, and evaluate how often the resulting translations make more sense in order than permuted. Proof-of-concept experiments on data-scarce human languages and constructed languages demonstrate the potential utility of this evaluation methodology. These human-language experiments serve solely to validate our reference-free metric under data scarcity. It is found to correlate highly with a standard evaluation based on reference translations, which are available in our experiments. We also perform a theoretical analysis suggesting that interaction may not be necessary nor efficient in the early stages of learning to translate.

关于动物交流翻译器的非交互式评估方法

On Non-interactive Evaluation of Animal Communication Translators

摘要

Support