FinAudio：金融應用中音訊大型語言模型的基準測試平台

摘要

音頻大型語言模型（AudioLLMs）已獲得廣泛關注，並在對話、音頻理解及自動語音識別（ASR）等音頻任務上顯著提升了性能。儘管取得了這些進展，目前仍缺乏一個基準來評估AudioLLMs在金融場景中的表現，其中如收益電話會議和CEO演講等音頻數據，是金融分析和投資決策的關鍵資源。本文中，我們介紹了FinAudio，這是首個旨在評估AudioLLMs在金融領域能力的基準。我們首先根據金融領域的獨特特性定義了三項任務：1）短金融音頻的ASR，2）長金融音頻的ASR，以及3）長金融音頻的摘要生成。隨後，我們分別策劃了兩個短音頻和兩個長音頻數據集，並開發了一個新穎的金融音頻摘要數據集，共同構成了FinAudio基準。接著，我們在FinAudio上評估了七種流行的AudioLLMs。我們的評估揭示了現有AudioLLMs在金融領域的局限性，並為改進AudioLLMs提供了見解。所有數據集和代碼將被公開。

English

Audio Large Language Models (AudioLLMs) have received widespread attention and have significantly improved performance on audio tasks such as conversation, audio understanding, and automatic speech recognition (ASR). Despite these advancements, there is an absence of a benchmark for assessing AudioLLMs in financial scenarios, where audio data, such as earnings conference calls and CEO speeches, are crucial resources for financial analysis and investment decisions. In this paper, we introduce FinAudio, the first benchmark designed to evaluate the capacity of AudioLLMs in the financial domain. We first define three tasks based on the unique characteristics of the financial domain: 1) ASR for short financial audio, 2) ASR for long financial audio, and 3) summarization of long financial audio. Then, we curate two short and two long audio datasets, respectively, and develop a novel dataset for financial audio summarization, comprising the FinAudio benchmark. Then, we evaluate seven prevalent AudioLLMs on FinAudio. Our evaluation reveals the limitations of existing AudioLLMs in the financial domain and offers insights for improving AudioLLMs. All datasets and codes will be released.

FinAudio：金融應用中音訊大型語言模型的基準測試平台

FinAudio: A Benchmark for Audio Large Language Models in Financial Applications

摘要

Support