ChatPaper.aiChatPaper

DITING:面向网络小说翻译基准测试的多智能体评估框架

DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

October 10, 2025
作者: Enze Zhang, Jiaying Wang, Mengxi Xiao, Jifei Liu, Ziyan Kuang, Rui Dong, Eric Dong, Sophia Ananiadou, Min Peng, Qianqian Xie
cs.AI

摘要

大型语言模型(LLMs)在机器翻译(MT)领域取得了显著进展,但其在网络小说翻译中的效果尚不明确。现有基准依赖于表层指标,未能捕捉到这一文类的独特特征。为填补这些空白,我们引入了DITING,首个针对网络小说翻译的全面评估框架,从六个维度评估叙事与文化忠实度:习语翻译、词汇歧义、术语本地化、时态一致性、零代词解析及文化安全性,并辅以超过18,000句中英对照的专家标注语料。我们进一步提出AgentEval,一个基于推理的多智能体评估框架,通过模拟专家审议来超越词汇重叠评估翻译质量,在七种测试的自动指标中与人类判断的相关性最高。为促进指标比较,我们开发了MetricAlign,一个包含300句对、标注有错误标签和标量质量分数的元评估数据集。对十四种开源、闭源及商业模型的全面评估显示,中文训练的LLMs超越规模更大的国外模型,而DeepSeek-V3提供了最为忠实且风格连贯的翻译。我们的工作为探索基于LLM的网络小说翻译建立了新范式,并提供了推动未来研究的公共资源。
English
Large language models (LLMs) have substantially advanced machine translation (MT), yet their effectiveness in translating web novels remains unclear. Existing benchmarks rely on surface-level metrics that fail to capture the distinctive traits of this genre. To address these gaps, we introduce DITING, the first comprehensive evaluation framework for web novel translation, assessing narrative and cultural fidelity across six dimensions: idiom translation, lexical ambiguity, terminology localization, tense consistency, zero-pronoun resolution, and cultural safety, supported by over 18K expert-annotated Chinese-English sentence pairs. We further propose AgentEval, a reasoning-driven multi-agent evaluation framework that simulates expert deliberation to assess translation quality beyond lexical overlap, achieving the highest correlation with human judgments among seven tested automatic metrics. To enable metric comparison, we develop MetricAlign, a meta-evaluation dataset of 300 sentence pairs annotated with error labels and scalar quality scores. Comprehensive evaluation of fourteen open, closed, and commercial models reveals that Chinese-trained LLMs surpass larger foreign counterparts, and that DeepSeek-V3 delivers the most faithful and stylistically coherent translations. Our work establishes a new paradigm for exploring LLM-based web novel translation and provides public resources to advance future research.
PDF942October 15, 2025