关于动物交流翻译器的非交互式评估方法
On Non-interactive Evaluation of Animal Communication Translators
October 17, 2025
作者: Orr Paradise, David F. Gruber, Adam Tauman Kalai
cs.AI
摘要
若你拥有一款鲸语至英语的AI翻译器,该如何验证其是否有效?是否需要与动物互动,或是依赖于诸如温度等具象的观测数据?我们提供的理论与概念验证实验证据表明,对于足够复杂的语言,互动乃至观察或许并非必需。人们或许仅凭翻译器的英文输出就能对其作出评估,这为安全性、伦理考量及成本控制带来了潜在优势。这是无参考译文情况下机器翻译质量评估(MTQE)的一个实例。核心挑战在于识别“幻觉”,即那些看似流畅合理实则错误的翻译。我们建议采用逐段翻译结合经典的NLP随机排列测试来评估翻译器。其思路是将动物交流逐句翻译,并评估翻译结果在顺序上比随机排列时更合理的频率。在数据稀缺的人类语言及构造语言上进行的概念验证实验,展示了这一评估方法的潜在效用。这些人类语言实验仅用于在数据稀缺条件下验证我们的无参考指标。研究发现,该指标与基于参考译文的标准评估高度相关,而参考译文在我们的实验中是可获取的。我们还进行了理论分析,表明在学习翻译的初期阶段,互动可能既非必要也非高效。
English
If you had an AI Whale-to-English translator, how could you validate whether
or not it is working? Does one need to interact with the animals or rely on
grounded observations such as temperature? We provide theoretical and
proof-of-concept experimental evidence suggesting that interaction and even
observations may not be necessary for sufficiently complex languages. One may
be able to evaluate translators solely by their English outputs, offering
potential advantages in terms of safety, ethics, and cost. This is an instance
of machine translation quality evaluation (MTQE) without any reference
translations available. A key challenge is identifying ``hallucinations,''
false translations which may appear fluent and plausible. We propose using
segment-by-segment translation together with the classic NLP shuffle test to
evaluate translators. The idea is to translate animal communication, turn by
turn, and evaluate how often the resulting translations make more sense in
order than permuted. Proof-of-concept experiments on data-scarce human
languages and constructed languages demonstrate the potential utility of this
evaluation methodology. These human-language experiments serve solely to
validate our reference-free metric under data scarcity. It is found to
correlate highly with a standard evaluation based on reference translations,
which are available in our experiments. We also perform a theoretical analysis
suggesting that interaction may not be necessary nor efficient in the early
stages of learning to translate.