EEGデータからの自然主義的音楽デコード：潜在拡散モデルによるアプローチ

要旨

本稿では、強力な生成モデル群である潜在拡散モデルを用いて、脳波（EEG）記録から自然な音楽を再構築する可能性を探る。MIDI生成曲や単旋律曲のような音色が限られた単純な音楽とは異なり、ここでは多様な楽器、声、エフェクトを特徴とし、倍音と音色が豊かな複雑な音楽に焦点を当てる。本研究は、非侵襲的なEEGデータを用いて高品質な一般的な音楽再構築を達成するための最初の試みであり、手動の前処理やチャネル選択を必要とせず、生データに対してエンドツーエンドの学習アプローチを採用している。公開データセットNMED-Tを用いてモデルを学習し、ニューラル埋め込みに基づく評価指標を提案して定量的評価を行う。さらに、生成されたトラックに基づく楽曲分類も実施する。本研究は、神経デコーディングと脳コンピュータインターフェースに関する継続的な研究に貢献し、複雑な聴覚情報の再構築にEEGデータを使用する可能性についての洞察を提供する。

English

In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. We additionally perform song classification based on the generated tracks. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.

EEGデータからの自然主義的音楽デコード：潜在拡散モデルによるアプローチ

Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

要旨

Support