DiarizationLM:使用大型语言模型进行说话人辨识后处理
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
January 7, 2024
作者: Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao
cs.AI
摘要
本文介绍了 DiarizationLM,这是一个利用大型语言模型(LLM)来后处理说话人辨别系统输出的框架。提出的框架可以实现各种目标,如提高辨别转录的可读性,或减少词辨别错误率(WDER)。在这个框架中,自动语音识别(ASR)和说话人辨别系统的输出被表示为紧凑的文本格式,该格式包含在可选地微调的LLM的提示中。LLM的输出可以作为经过改进的辨别结果来实现所需的增强。作为后处理步骤,这个框架可以轻松应用于任何现成的ASR和说话人辨别系统,而无需重新训练现有组件。我们的实验表明,微调的PaLM 2-S模型可以在Fisher电话对话数据集上将WDER降低相对25.9%,在Callhome英语数据集上降低相对31%。
English
In this paper, we introduce DiarizationLM, a framework to leverage large
language models (LLM) to post-process the outputs from a speaker diarization
system. Various goals can be achieved with the proposed framework, such as
improving the readability of the diarized transcript, or reducing the word
diarization error rate (WDER). In this framework, the outputs of the automatic
speech recognition (ASR) and speaker diarization systems are represented as a
compact textual format, which is included in the prompt to an optionally
finetuned LLM. The outputs of the LLM can be used as the refined diarization
results with the desired enhancement. As a post-processing step, this framework
can be easily applied to any off-the-shelf ASR and speaker diarization systems
without retraining existing components. Our experiments show that a finetuned
PaLM 2-S model can reduce the WDER by rel. 25.9% on the Fisher telephone
conversation dataset, and rel. 31% on the Callhome English dataset.