你的实时同声传译语音转文本系统有多"真实"?
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
December 24, 2024
作者: Sara Papi, Peter Polak, Ondřej Bojar, Dominik Macháček
cs.AI
摘要
同时语音转文本翻译(SimulST)将源语言语音与演讲者的讲话同步转换为目标语言文本,确保低延迟以提高用户理解能力。尽管其旨在应用于无限制语音,但大多数研究集中在人为预分段的语音上,简化了任务并忽视了重要挑战。这种狭窄的关注焦点,加上广泛存在的术语不一致性,限制了研究成果在现实应用中的适用性,最终阻碍了该领域的进展。我们对110篇论文进行了广泛的文献综述,不仅揭示了当前研究中的关键问题,还为我们的主要贡献奠定了基础。我们1)定义SimulST系统的步骤和核心组件,提出了标准化术语和分类法;2)进行了对社区趋势的彻底分析;3)提出了具体建议和未来发展方向,以弥补现有文献中的差距,从评估框架到系统架构,推动该领域朝着更加现实和有效的SimulST解决方案发展。
English
Simultaneous speech-to-text translation (SimulST) translates source-language
speech into target-language text concurrently with the speaker's speech,
ensuring low latency for better user comprehension. Despite its intended
application to unbounded speech, most research has focused on human
pre-segmented speech, simplifying the task and overlooking significant
challenges. This narrow focus, coupled with widespread terminological
inconsistencies, is limiting the applicability of research outcomes to
real-world applications, ultimately hindering progress in the field. Our
extensive literature review of 110 papers not only reveals these critical
issues in current research but also serves as the foundation for our key
contributions. We 1) define the steps and core components of a SimulST system,
proposing a standardized terminology and taxonomy; 2) conduct a thorough
analysis of community trends, and 3) offer concrete recommendations and future
directions to bridge the gaps in existing literature, from evaluation
frameworks to system architectures, for advancing the field towards more
realistic and effective SimulST solutions.Summary
AI-Generated Summary