GPT-通話:通過生成大型語言模型的合成對話來增強通話分割和標記
GPT-Calls: Enhancing Call Segmentation and Tagging by Generating Synthetic Conversations via Large Language Models
June 9, 2023
作者: Itzik Malkiel, Uri Alon, Yakir Yehuda, Shahar Keren, Oren Barkan, Royi Ronen, Noam Koenigstein
cs.AI
摘要
電話通話的轉錄在各個領域中具有重要價值,例如銷售、客戶服務、醫療保健和執法。然而,對這些錄音對話的分析可能是一個費時費力的過程,特別是在處理延長或多方面對話時。在這項工作中,我們提出了一種新穎的方法,稱為GPT-distilled Calls Segmentation and Tagging (GPT-Calls),用於高效準確的通話分割和主題提取。GPT-Calls 包括離線和在線階段。離線階段應用於一給定主題列表一次,通過使用 GPT 模型為每個主題生成一個合成句子分佈並提取錨向量。在線階段應用於每通話單獨,對轉錄對話與離線階段中找到的主題錨之間的相似性進行評分。然後,對相似性分數進行時間域分析,將發言分組為段落並標記主題。所提出的範式提供了一種不需要標記數據的通話分割和主題提取的準確高效方法,因此是一種適用於各種領域的多功能方法。我們的算法在 Dynamics 365 銷售對話智能下運行,我們的研究基於從各種 Dynamics 365 銷售租戶收集的真實銷售對話。
English
Transcriptions of phone calls are of significant value across diverse fields,
such as sales, customer service, healthcare, and law enforcement. Nevertheless,
the analysis of these recorded conversations can be an arduous and
time-intensive process, especially when dealing with extended or multifaceted
dialogues. In this work, we propose a novel method, GPT-distilled Calls
Segmentation and Tagging (GPT-Calls), for efficient and accurate call
segmentation and topic extraction. GPT-Calls is composed of offline and online
phases. The offline phase is applied once to a given list of topics and
involves generating a distribution of synthetic sentences for each topic using
a GPT model and extracting anchor vectors. The online phase is applied to every
call separately and scores the similarity between the transcripted conversation
and the topic anchors found in the offline phase. Then, time domain analysis is
applied to the similarity scores to group utterances into segments and tag them
with topics. The proposed paradigm provides an accurate and efficient method
for call segmentation and topic extraction that does not require labeled data,
thus making it a versatile approach applicable to various domains. Our
algorithm operates in production under Dynamics 365 Sales Conversation
Intelligence, and our research is based on real sales conversations gathered
from various Dynamics 365 Sales tenants.