Simulstream：流式語音轉文本翻譯系統評估與演示開源工具包

摘要

流式语音到文本翻译（StreamST）要求在处理输入语音的同时实时生成译文，这既施加了严格的延迟限制，又需要模型在部分信息决策与高质量翻译之间取得平衡。该领域的研究工作迄今主要依赖SimulEval代码库，但该库已停止维护，且不支持输出修订型系统。此外，该工具原为模拟短音频片段处理而设计，不适用于长音频流场景，也未提供便捷的系统演示功能。为此，我们推出simulstream——首个专用于流式语音翻译系统统一评估与演示的开源框架。该框架专为长语音流处理设计，不仅支持增量解码方法，还兼容重翻译机制，可在同一框架内进行质量与延迟的双维度对比。同时，其交互式网页界面支持展示基于该工具构建的任何系统。

English

Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance partial-information decision-making with high translation quality. Research efforts on the topic have so far relied on the SimulEval repository, which is no longer maintained and does not support systems that revise their outputs. In addition, it has been designed for simulating the processing of short segments, rather than long-form audio streams, and it does not provide an easy method to showcase systems in a demo. As a solution, we introduce simulstream, the first open-source framework dedicated to unified evaluation and demonstration of StreamST systems. Designed for long-form speech processing, it supports not only incremental decoding approaches, but also re-translation methods, enabling for their comparison within the same framework both in terms of quality and latency. In addition, it also offers an interactive web interface to demo any system built within the tool.

Simulstream：流式語音轉文本翻譯系統評估與演示開源工具包

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

摘要

Support