LLMによるGrapheme-to-Phoneme変換：ベンチマークとケーススタディ

要旨

グラフェムから音素への変換（G2P）は、特に音声合成などのアプリケーションにおいて、音声処理において重要です。G2Pシステムは、多義語や文脈依存の音素を持つ言語に対する言語理解と文脈理解を必要とします。大規模言語モデル（LLMs）は、最近、さまざまな言語タスクで重要な潜在能力を示しており、その音声知識がG2Pに活用できる可能性が示唆されています。本論文では、LLMsのG2P変換における性能を評価し、追加のトレーニングやラベル付きデータなしでLLMsの出力を向上させるプロンプティングおよびポスト処理手法を紹介します。また、ペルシャ語の文レベルの音声的課題におけるG2Pの性能を評価するために設計されたベンチマークデータセットを提供します。提案された手法を適用することで、LLMsが従来のG2Pツールを上回ることが示され、ペルシャ語のような未代表言語でも、LLM支援のG2Pシステムの開発の可能性が示されています。

English

Grapheme-to-phoneme (G2P) conversion is critical in speech processing, particularly for applications like speech synthesis. G2P systems must possess linguistic understanding and contextual awareness of languages with polyphone words and context-dependent phonemes. Large language models (LLMs) have recently demonstrated significant potential in various language tasks, suggesting that their phonetic knowledge could be leveraged for G2P. In this paper, we evaluate the performance of LLMs in G2P conversion and introduce prompting and post-processing methods that enhance LLM outputs without additional training or labeled data. We also present a benchmarking dataset designed to assess G2P performance on sentence-level phonetic challenges of the Persian language. Our results show that by applying the proposed methods, LLMs can outperform traditional G2P tools, even in an underrepresented language like Persian, highlighting the potential of developing LLM-aided G2P systems.

LLMによるGrapheme-to-Phoneme変換：ベンチマークとケーススタディ

LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study

要旨

Support