多模態交響曲：透過生成式人工智慧整合味覺與聽覺

摘要

近幾十年來，神經科學與心理學研究已揭示了味覺與聽覺感知之間的直接關聯。本文基於這一基礎研究，探討了能夠將味覺信息轉化為音樂的多模態生成模型。我們簡要回顧了該領域的最新進展，重點介紹了關鍵發現與方法論。我們進行了一項實驗，其中使用了一個經過微調的音樂生成模型（MusicGEN），根據每首音樂作品提供的詳細味覺描述來生成音樂。結果令人鼓舞：根據參與者（n=111）的評估，與未經微調的模型相比，微調後的模型生成的音樂更為一致地反映了輸入的味覺描述。這項研究代表了在理解與開發人工智能、聲音及味覺之間具身交互方面邁出的重要一步，為生成式人工智能領域開闢了新的可能性。我們在以下網址發布了我們的數據集、代碼及預訓練模型：https://osf.io/xs5jy/。

English

In recent decades, neuroscientific and psychological research has traced direct relationships between taste and auditory perceptions. This article explores multimodal generative models capable of converting taste information into music, building on this foundational research. We provide a brief review of the state of the art in this field, highlighting key findings and methodologies. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece. The results are promising: according the participants' (n=111) evaluation, the fine-tuned model produces music that more coherently reflects the input taste descriptions compared to the non-fine-tuned model. This study represents a significant step towards understanding and developing embodied interactions between AI, sound, and taste, opening new possibilities in the field of generative AI. We release our dataset, code and pre-trained model at: https://osf.io/xs5jy/.

多模態交響曲：透過生成式人工智慧整合味覺與聽覺

A Multimodal Symphony: Integrating Taste and Sound through Generative AI

摘要

Support