マルチモーダルなシンフォニー：生成AIによる味覚と音の統合

要旨

ここ数十年、神経科学と心理学の研究は、味覚と聴覚の知覚との間に直接的な関係があることを明らかにしてきた。本稿では、この基礎研究を基盤として、味覚情報を音楽に変換可能なマルチモーダル生成モデルを探求する。本分野における最先端の研究動向を概観し、主要な発見と方法論を紹介する。また、各楽曲に対して提供された詳細な味覚記述に基づいて音楽を生成するために、生成音楽モデル（MusicGEN）のファインチューニング版を用いた実験を実施する。その結果は有望であり、参加者（n=111）の評価によると、ファインチューニングされたモデルは、ファインチューニングされていないモデルと比較して、入力された味覚記述をより一貫して反映した音楽を生成することが示された。本研究は、AI、音、味覚の間の具現的相互作用を理解し、発展させる上で重要な一歩を表しており、生成AIの分野における新たな可能性を開くものである。データセット、コード、事前学習済みモデルを以下で公開する：https://osf.io/xs5jy/

English

In recent decades, neuroscientific and psychological research has traced direct relationships between taste and auditory perceptions. This article explores multimodal generative models capable of converting taste information into music, building on this foundational research. We provide a brief review of the state of the art in this field, highlighting key findings and methodologies. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece. The results are promising: according the participants' (n=111) evaluation, the fine-tuned model produces music that more coherently reflects the input taste descriptions compared to the non-fine-tuned model. This study represents a significant step towards understanding and developing embodied interactions between AI, sound, and taste, opening new possibilities in the field of generative AI. We release our dataset, code and pre-trained model at: https://osf.io/xs5jy/.

マルチモーダルなシンフォニー：生成AIによる味覚と音の統合

A Multimodal Symphony: Integrating Taste and Sound through Generative AI

要旨

Support