ChatPaper.aiChatPaper

DITING:一個多智能體評估框架,用於網絡小說翻譯的基準測試

DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

October 10, 2025
作者: Enze Zhang, Jiaying Wang, Mengxi Xiao, Jifei Liu, Ziyan Kuang, Rui Dong, Eric Dong, Sophia Ananiadou, Min Peng, Qianqian Xie
cs.AI

摘要

大型語言模型(LLMs)在機器翻譯(MT)領域取得了顯著進展,然而其在網絡小說翻譯中的有效性仍不明確。現有的基準依賴於表面層次的指標,未能捕捉到這一文類的獨特特徵。為填補這些空白,我們引入了DITING,這是首個針對網絡小說翻譯的全面評估框架,從六個維度評估敘事與文化忠實度:成語翻譯、詞彙歧義、術語本地化、時態一致性、零代詞解析及文化安全性,並由超過18,000句中英雙語專家註釋的句子對提供支持。我們進一步提出了AgentEval,這是一個基於推理的多代理評估框架,模擬專家審議以評估超越詞彙重疊的翻譯質量,在七種測試的自動指標中與人類判斷的相關性最高。為了實現指標比較,我們開發了MetricAlign,這是一個包含300句對的元評估數據集,每句對均標註有錯誤標籤和標量質量評分。對十四種開源、閉源及商業模型的全面評估顯示,中文訓練的LLMs超越了規模更大的外國模型,而DeepSeek-V3則提供了最忠實且風格連貫的翻譯。我們的工作為探索基於LLM的網絡小說翻譯建立了新範式,並提供了公共資源以推動未來研究。
English
Large language models (LLMs) have substantially advanced machine translation (MT), yet their effectiveness in translating web novels remains unclear. Existing benchmarks rely on surface-level metrics that fail to capture the distinctive traits of this genre. To address these gaps, we introduce DITING, the first comprehensive evaluation framework for web novel translation, assessing narrative and cultural fidelity across six dimensions: idiom translation, lexical ambiguity, terminology localization, tense consistency, zero-pronoun resolution, and cultural safety, supported by over 18K expert-annotated Chinese-English sentence pairs. We further propose AgentEval, a reasoning-driven multi-agent evaluation framework that simulates expert deliberation to assess translation quality beyond lexical overlap, achieving the highest correlation with human judgments among seven tested automatic metrics. To enable metric comparison, we develop MetricAlign, a meta-evaluation dataset of 300 sentence pairs annotated with error labels and scalar quality scores. Comprehensive evaluation of fourteen open, closed, and commercial models reveals that Chinese-trained LLMs surpass larger foreign counterparts, and that DeepSeek-V3 delivers the most faithful and stylistically coherent translations. Our work establishes a new paradigm for exploring LLM-based web novel translation and provides public resources to advance future research.
PDF942October 15, 2025