AV-GS: 新視点音響合成のための材料特性と幾何学的特性を考慮した事前学習

要旨

新視点音響合成（NVAS）は、3Dシーン内の音源から発せられるモノラル音声を基に、任意の視点におけるバイノーラル音声をレンダリングすることを目的としています。既存の手法では、視覚的手がかりを条件として利用するNeRFベースの暗黙的モデルが提案されてきました。しかし、重いNeRFレンダリングに起因する非効率性に加え、これらの手法は部屋の形状、材質特性、リスナーと音源間の空間関係といったシーン環境全体を特徴づける能力に限界があります。これらの課題を解決するため、我々は新しいオーディオビジュアルガウススプラッティング（AV-GS）モデルを提案します。音響合成のための材質認識および形状認識条件を得るために、リスナーと音源間の空間関係を考慮し、局所的に初期化されたガウス点にオーディオガイダンスパラメータを付与した明示的なポイントベースのシーン表現を学習します。視覚シーンモデルを音響適応型とするため、音の伝播におけるポイントごとの寄与（例：音路の分岐に影響を与えるテクスチャのない壁面にはより多くのポイントが必要）を考慮し、ガウス点を最適に分布させるポイント密度化と剪定戦略を提案します。実世界のRWASデータセットおよびシミュレーションベースのSoundSpacesデータセットを用いた広範な実験により、我々のAV-GSが既存の代替手法を凌駕する優位性を検証しました。

English

Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability of characterizing the entire scene environment such as room geometry, material properties, and the spatial relation between the listener and sound source. To address these issues, we propose a novel Audio-Visual Gaussian Splatting (AV-GS) model. To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source. To make the visual scene model audio adaptive, we propose a point densification and pruning strategy to optimally distribute the Gaussian points, with the per-point contribution in sound propagation (e.g., more points needed for texture-less wall surfaces as they affect sound path diversion). Extensive experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.

AV-GS: 新視点音響合成のための材料特性と幾何学的特性を考慮した事前学習

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

要旨

Support