ChatPaper.aiChatPaper

逐句式演說摘要:任務、資料集,以及具有語言模型知識蒸餾的端對端建模

Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation

August 1, 2024
作者: Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Masato Mimura, Takatomo Kano, Atsunori Ogawa, Marc Delcroix
cs.AI

摘要

本文介紹了一種新穎的方法,稱為句子級語音摘要(Sen-SSum),它以逐句方式從口語文檔生成文本摘要。Sen-SSum結合了自動語音識別(ASR)的實時處理和語音摘要的簡潔性。為了探索這種方法,我們提出了兩個Sen-SSum的數據集:Mega-SSum和CSJ-SSum。使用這些數據集,我們的研究評估了兩種基於Transformer的模型:1)結合ASR和強大文本摘要模型的級聯模型,以及2)直接將語音轉換為文本摘要的端對端(E2E)模型。雖然E2E模型有助於開發高效的模型,但其表現不如級聯模型。因此,我們提出使用由級聯模型生成的虛擬摘要對E2E模型進行知識蒸餾。我們的實驗表明,這種提出的知識蒸餾有效地提高了E2E模型在兩個數據集上的性能。
English
This paper introduces a novel approach called sentence-wise speech summarization (Sen-SSum), which generates text summaries from a spoken document in a sentence-by-sentence manner. Sen-SSum combines the real-time processing of automatic speech recognition (ASR) with the conciseness of speech summarization. To explore this approach, we present two datasets for Sen-SSum: Mega-SSum and CSJ-SSum. Using these datasets, our study evaluates two types of Transformer-based models: 1) cascade models that combine ASR and strong text summarization models, and 2) end-to-end (E2E) models that directly convert speech into a text summary. While E2E models are appealing to develop compute-efficient models, they perform worse than cascade models. Therefore, we propose knowledge distillation for E2E models using pseudo-summaries generated by the cascade models. Our experiments show that this proposed knowledge distillation effectively improves the performance of the E2E model on both datasets.

Summary

AI-Generated Summary

PDF62November 28, 2024