利用大型語言模型增強口語理解的文本
Augmenting text for spoken language understanding with Large Language Models
September 17, 2023
作者: Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer
cs.AI
摘要
口語語義解析(SSP)涉及從語音輸入生成機器可理解的解析。訓練強大的模型以應用於現有訓練數據中表示的應用領域,或擴展到新領域,需要相應的語音-轉錄-語義解析數據三元組,這是昂貴的。本文通過研究可以使用轉錄-語義解析數據(未配對文本)而無需相應語音的方法來應對這一挑戰。首先,當未配對文本來自現有文本語料庫時,比較了聯合音頻文本(JAT)和文本轉語音(TTS)作為為未配對文本生成語音表示的方法。對STOP數據集的實驗表明,現有和新領域的未配對文本分別使性能提高了2%和30%的絕對準確度(EM)。其次,考慮當現有文本語料庫中沒有未配對文本的情況。我們建議提示大型語言模型(LLMs)生成現有和新領域的未配對文本。實驗表明,與意圖同時出現的示例和單詞可用於使用Llama 2.0生成未配對文本。將生成的文本與JAT和TTS一起用於口語語義解析可使現有和新領域的STOP EM分別提高1.4%和2.6%的絕對準確度。
English
Spoken semantic parsing (SSP) involves generating machine-comprehensible
parses from input speech. Training robust models for existing application
domains represented in training data or extending to new domains requires
corresponding triplets of speech-transcript-semantic parse data, which is
expensive to obtain. In this paper, we address this challenge by examining
methods that can use transcript-semantic parse data (unpaired text) without
corresponding speech. First, when unpaired text is drawn from existing textual
corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways
to generate speech representations for unpaired text. Experiments on the STOP
dataset show that unpaired text from existing and new domains improves
performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, we
consider the setting when unpaired text is not available in existing textual
corpora. We propose to prompt Large Language Models (LLMs) to generate unpaired
text for existing and new domains. Experiments show that examples and words
that co-occur with intents can be used to generate unpaired text with Llama
2.0. Using the generated text with JAT and TTS for spoken semantic parsing
improves EM on STOP by 1.4% and 2.6% absolute for existing and new domains
respectively.