通過潛在擴散模型從腦電圖數據解碼自然音樂

摘要

本文探討了潛在擴散模型的潛力，這是一類功能強大的生成模型，用於從腦電圖（EEG）記錄中重建自然音樂的任務。與簡單的音樂（如MIDI生成的曲調或單音樂曲）不同，這裡的重點是複雜的音樂，具有多樣的樂器、聲音和效果，豐富的諧波和音色。本研究代表了初步嘗試，旨在使用非侵入性EEG數據實現高質量的通用音樂重建，採用端到端的訓練方法，直接在原始數據上進行，無需手動預處理和通道選擇。我們在公開的NMED-T數據集上訓練我們的模型，並提出基於神經嵌入的量化評估指標。此外，我們基於生成的曲目進行歌曲分類。我們的工作有助於神經解碼和腦-電腦界面的持續研究，提供了使用EEG數據進行複雜聽覺信息重建的可行性洞察。

English

In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. We additionally perform song classification based on the generated tracks. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.

通過潛在擴散模型從腦電圖數據解碼自然音樂

Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

摘要

Support