DiffVox：プロフェッショナルなエフェクト分布の捕捉と分析のための微分可能モデル

要旨

本研究では、音楽制作におけるボーカルエフェクトのマッチングを可能にする新規で解釈可能なモデル、DiffVoxを紹介する。DiffVox（「Differentiable Vocal Fx」の略称）は、パラメトリックイコライゼーション、ダイナミックレンジコントロール、ディレイ、リバーブを効率的な微分可能な実装と統合し、勾配ベースの最適化によるパラメータ推定を可能にする。ボーカルプリセットは、MedleyDBの70トラックとプライベートコレクションの365トラックからなる2つのデータセットから取得される。パラメータ相関の分析により、ハイパスフィルタとローシェルフフィルタが低域を形成するためにしばしば連動するなど、エフェクトとパラメータの間に強い関係性があることが明らかになった。また、ディレイ時間はディレイ信号の強度と相関している。主成分分析により、McAdamsの音色次元との関連性が明らかになり、最も重要な成分は知覚される広がりを調整し、二次成分はスペクトルの明るさに影響を与えることがわかった。統計的検定により、パラメータ分布が非ガウス的であることが確認され、ボーカルエフェクト空間の複雑さが強調された。これらのパラメータ分布に関する初期の発見は、ボーカルエフェクトモデリングと自動ミキシングの今後の研究の基盤を築くものである。ソースコードとデータセットはhttps://github.com/SonyResearch/diffvoxで公開されている。

English

This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, comprising 70 tracks from MedleyDB and 365 tracks from a private collection. Analysis of parameter correlations highlights strong relationships between effects and parameters, such as the high-pass and low-shelf filters often behaving together to shape the low end, and the delay time correlates with the intensity of the delayed signals. Principal component analysis reveals connections to McAdams' timbre dimensions, where the most crucial component modulates the perceived spaciousness while the secondary components influence spectral brightness. Statistical testing confirms the non-Gaussian nature of the parameter distribution, highlighting the complexity of the vocal effects space. These initial findings on the parameter distributions set the foundation for future research in vocal effects modelling and automatic mixing. Our source code and datasets are accessible at https://github.com/SonyResearch/diffvox.

DiffVox：プロフェッショナルなエフェクト分布の捕捉と分析のための微分可能モデル

DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions

要旨

Support