利用大型语言模型增强口语理解的文本

摘要

口语语义解析（SSP）涉及从输入语音生成机器可理解的解析。训练现有应用领域在训练数据中表示或扩展到新领域的强大模型需要相应的语音-转录-语义解析数据三元组，这些数据获取起来成本高昂。本文通过研究可以使用转录-语义解析数据（不成对文本）而无需相应语音的方法来应对这一挑战。首先，当不成对文本来自现有文本语料库时，将比较联合音频文本（JAT）和文本转语音（TTS）作为生成不成对文本的方式。在STOP数据集上的实验表明，现有和新领域的不成对文本分别使准确匹配（EM）的性能提高了2%和30%。其次，我们考虑当现有文本语料库中没有不成对文本时的情况。我们建议促使大型语言模型（LLMs）生成现有和新领域的不成对文本。实验表明，与意图共现的示例和词汇可用于使用Llama 2.0生成不成对文本。将生成的文本与JAT和TTS一起用于口语语义解析可使现有和新领域的STOP上的EM分别提高1.4%和2.6%。

English

Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of speech-transcript-semantic parse data, which is expensive to obtain. In this paper, we address this challenge by examining methods that can use transcript-semantic parse data (unpaired text) without corresponding speech. First, when unpaired text is drawn from existing textual corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways to generate speech representations for unpaired text. Experiments on the STOP dataset show that unpaired text from existing and new domains improves performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, we consider the setting when unpaired text is not available in existing textual corpora. We propose to prompt Large Language Models (LLMs) to generate unpaired text for existing and new domains. Experiments show that examples and words that co-occur with intents can be used to generate unpaired text with Llama 2.0. Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1.4% and 2.6% absolute for existing and new domains respectively.

利用大型语言模型增强口语理解的文本

Augmenting text for spoken language understanding with Large Language Models

摘要

Support