DiarizationLM:使用大型語言模型進行語者語者分割後處理
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
January 7, 2024
作者: Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao
cs.AI
摘要
本文介紹了 DiarizationLM,一個利用大型語言模型(LLM)來後處理語者分割系統輸出的框架。提出的框架可以實現多種目標,如改善分段轉錄的可讀性,或降低詞語分割錯誤率(WDER)。在這個框架中,自動語音識別(ASR)和語者分割系統的輸出被表示為一個緊湊的文本格式,並包含在可選地微調的LLM的提示中。LLM的輸出可以作為經過改進的分割結果來使用。作為後處理步驟,這個框架可以輕鬆應用於任何現成的ASR和語者分割系統,而無需重新訓練現有組件。我們的實驗表明,一個經過微調的PaLM 2-S模型可以在Fisher電話對話數據集上將WDER降低了25.9%,在Callhome英語數據集上降低了31%。
English
In this paper, we introduce DiarizationLM, a framework to leverage large
language models (LLM) to post-process the outputs from a speaker diarization
system. Various goals can be achieved with the proposed framework, such as
improving the readability of the diarized transcript, or reducing the word
diarization error rate (WDER). In this framework, the outputs of the automatic
speech recognition (ASR) and speaker diarization systems are represented as a
compact textual format, which is included in the prompt to an optionally
finetuned LLM. The outputs of the LLM can be used as the refined diarization
results with the desired enhancement. As a post-processing step, this framework
can be easily applied to any off-the-shelf ASR and speaker diarization systems
without retraining existing components. Our experiments show that a finetuned
PaLM 2-S model can reduce the WDER by rel. 25.9% on the Fisher telephone
conversation dataset, and rel. 31% on the Callhome English dataset.