DiarizationLM：使用大型語言模型進行語者語者分割後處理

摘要

本文介紹了 DiarizationLM，一個利用大型語言模型（LLM）來後處理語者分割系統輸出的框架。提出的框架可以實現多種目標，如改善分段轉錄的可讀性，或降低詞語分割錯誤率（WDER）。在這個框架中，自動語音識別（ASR）和語者分割系統的輸出被表示為一個緊湊的文本格式，並包含在可選地微調的LLM的提示中。LLM的輸出可以作為經過改進的分割結果來使用。作為後處理步驟，這個框架可以輕鬆應用於任何現成的ASR和語者分割系統，而無需重新訓練現有組件。我們的實驗表明，一個經過微調的PaLM 2-S模型可以在Fisher電話對話數據集上將WDER降低了25.9%，在Callhome英語數據集上降低了31%。

English

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 25.9% on the Fisher telephone conversation dataset, and rel. 31% on the Callhome English dataset.

DiarizationLM：使用大型語言模型進行語者語者分割後處理

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

摘要

Support