DiarizationLM：使用大型语言模型进行说话人辨识后处理

摘要

本文介绍了 DiarizationLM，这是一个利用大型语言模型（LLM）来后处理说话人辨别系统输出的框架。提出的框架可以实现各种目标，如提高辨别转录的可读性，或减少词辨别错误率（WDER）。在这个框架中，自动语音识别（ASR）和说话人辨别系统的输出被表示为紧凑的文本格式，该格式包含在可选地微调的LLM的提示中。LLM的输出可以作为经过改进的辨别结果来实现所需的增强。作为后处理步骤，这个框架可以轻松应用于任何现成的ASR和说话人辨别系统，而无需重新训练现有组件。我们的实验表明，微调的PaLM 2-S模型可以在Fisher电话对话数据集上将WDER降低相对25.9%，在Callhome英语数据集上降低相对31%。

English

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 25.9% on the Fisher telephone conversation dataset, and rel. 31% on the Callhome English dataset.

DiarizationLM：使用大型语言模型进行说话人辨识后处理

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

摘要

Support