利用大型语言模型增强口语理解的文本
Augmenting text for spoken language understanding with Large Language Models
September 17, 2023
作者: Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer
cs.AI
摘要
口语语义解析(SSP)涉及从输入语音生成机器可理解的解析。训练现有应用领域在训练数据中表示或扩展到新领域的强大模型需要相应的语音-转录-语义解析数据三元组,这些数据获取起来成本高昂。本文通过研究可以使用转录-语义解析数据(不成对文本)而无需相应语音的方法来应对这一挑战。首先,当不成对文本来自现有文本语料库时,将比较联合音频文本(JAT)和文本转语音(TTS)作为生成不成对文本的方式。在STOP数据集上的实验表明,现有和新领域的不成对文本分别使准确匹配(EM)的性能提高了2%和30%。其次,我们考虑当现有文本语料库中没有不成对文本时的情况。我们建议促使大型语言模型(LLMs)生成现有和新领域的不成对文本。实验表明,与意图共现的示例和词汇可用于使用Llama 2.0生成不成对文本。将生成的文本与JAT和TTS一起用于口语语义解析可使现有和新领域的STOP上的EM分别提高1.4%和2.6%。
English
Spoken semantic parsing (SSP) involves generating machine-comprehensible
parses from input speech. Training robust models for existing application
domains represented in training data or extending to new domains requires
corresponding triplets of speech-transcript-semantic parse data, which is
expensive to obtain. In this paper, we address this challenge by examining
methods that can use transcript-semantic parse data (unpaired text) without
corresponding speech. First, when unpaired text is drawn from existing textual
corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways
to generate speech representations for unpaired text. Experiments on the STOP
dataset show that unpaired text from existing and new domains improves
performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, we
consider the setting when unpaired text is not available in existing textual
corpora. We propose to prompt Large Language Models (LLMs) to generate unpaired
text for existing and new domains. Experiments show that examples and words
that co-occur with intents can be used to generate unpaired text with Llama
2.0. Using the generated text with JAT and TTS for spoken semantic parsing
improves EM on STOP by 1.4% and 2.6% absolute for existing and new domains
respectively.