LLM 动力驱动的字素到音素转换：基准和案例研究

摘要

图音转换（G2P）在语音处理中至关重要，特别是对于诸如语音合成之类的应用。G2P系统必须具备对多音词和上下文相关音素的语言理解和上下文意识。大型语言模型（LLMs）最近在各种语言任务中展现出显著潜力，表明它们的语音知识可以用于G2P。在本文中，我们评估了LLMs在G2P转换中的性能，并介绍了促使和后处理方法，可以增强LLM的输出，而无需额外的训练或标记数据。我们还提出了一个基准数据集，旨在评估对波斯语句子级音韵挑战的G2P性能。我们的结果表明，通过应用所提出的方法，LLMs可以在波斯语这样的少见语言中胜过传统的G2P工具，突显了开发LLM辅助的G2P系统的潜力。

English

Grapheme-to-phoneme (G2P) conversion is critical in speech processing, particularly for applications like speech synthesis. G2P systems must possess linguistic understanding and contextual awareness of languages with polyphone words and context-dependent phonemes. Large language models (LLMs) have recently demonstrated significant potential in various language tasks, suggesting that their phonetic knowledge could be leveraged for G2P. In this paper, we evaluate the performance of LLMs in G2P conversion and introduce prompting and post-processing methods that enhance LLM outputs without additional training or labeled data. We also present a benchmarking dataset designed to assess G2P performance on sentence-level phonetic challenges of the Persian language. Our results show that by applying the proposed methods, LLMs can outperform traditional G2P tools, even in an underrepresented language like Persian, highlighting the potential of developing LLM-aided G2P systems.

LLM 动力驱动的字素到音素转换：基准和案例研究

LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

摘要

Support