通过潜在扩散模型从脑电图数据中解码自然音乐
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
May 15, 2024
作者: Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo Akama
cs.AI
摘要
在本文中,我们探讨了使用潜在扩散模型这一强大生成模型家族,来重建自然音乐从脑电图(EEG)记录中的潜力。与简单的具有有限音色的音乐不同,比如MIDI生成的曲调或单声部作品,重点在于包含各种乐器、声音和效果、丰富和声和音色的复杂音乐。这项研究代表了初步尝试,通过在非侵入性EEG数据上直接进行端到端训练,无需手动预处理和通道选择,实现高质量的通用音乐重建。我们在公开的NMED-T数据集上训练我们的模型,并进行量化评估,提出基于神经嵌入的度量标准。此外,我们基于生成的音轨进行歌曲分类。我们的工作为神经解码和脑-计算机界面的持续研究做出了贡献,为使用EEG数据进行复杂听觉信息重建的可行性提供了见解。
English
In this article, we explore the potential of using latent diffusion models, a
family of powerful generative models, for the task of reconstructing
naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler
music with limited timbres, such as MIDI-generated tunes or monophonic pieces,
the focus here is on intricate music featuring a diverse array of instruments,
voices, and effects, rich in harmonics and timbre. This study represents an
initial foray into achieving general music reconstruction of high-quality using
non-invasive EEG data, employing an end-to-end training approach directly on
raw data without the need for manual pre-processing and channel selection. We
train our models on the public NMED-T dataset and perform quantitative
evaluation proposing neural embedding-based metrics. We additionally perform
song classification based on the generated tracks. Our work contributes to the
ongoing research in neural decoding and brain-computer interfaces, offering
insights into the feasibility of using EEG data for complex auditory
information reconstruction.Summary
AI-Generated Summary