RGB-イベントセマンティックセグメンテーションの再考：新たな双方向モーション強化型イベント表現の提案

要旨

イベントカメラは動きのダイナミクスを捉え、様々なコンピュータビジョンタスクにおいて大きな可能性を秘めた独自のモダリティを提供します。しかし、RGBとイベントの融合には3つの本質的なミスアライメントが存在します：(i)時間的、(ii)空間的、(iii)モーダルなミスアライメントです。既存のボクセルグリッド表現は、連続するイベントウィンドウ間の時間的相関を無視しており、非同期で疎なイベントを単純に累積するその定式化は、同期型で密なRGBモダリティと互換性がありません。これらの課題に対処するため、我々は新しいイベント表現であるMotion-enhanced Event Tensor (MET)を提案します。METは、密なオプティカルフローとイベントの時間的特徴を活用して、疎なイベントボクセルを密で時間的に一貫した形式に変換します。さらに、Frequency-aware Bidirectional Flow Aggregation Module (BFAM)とTemporal Fusion Module (TFM)を導入します。BFAMは周波数領域とMETを活用してモーダルなミスアライメントを軽減し、双方向フロー集約と時間的融合メカニズムによって時空間的なミスアライメントを解決します。2つの大規模データセットでの実験結果は、我々のフレームワークが最先端のRGB-イベントセマンティックセグメンテーション手法を大幅に上回ることを示しています。コードは以下で公開されています: https://github.com/zyaocoder/BRENet。

English

Event cameras capture motion dynamics, offering a unique modality with great potential in various computer vision tasks. However, RGB-Event fusion faces three intrinsic misalignments: (i) temporal, (ii) spatial, and (iii) modal misalignment. Existing voxel grid representations neglect temporal correlations between consecutive event windows, and their formulation with simple accumulation of asynchronous and sparse events is incompatible with the synchronous and dense nature of RGB modality. To tackle these challenges, we propose a novel event representation, Motion-enhanced Event Tensor (MET), which transforms sparse event voxels into a dense and temporally coherent form by leveraging dense optical flows and event temporal features. In addition, we introduce a Frequency-aware Bidirectional Flow Aggregation Module (BFAM) and a Temporal Fusion Module (TFM). BFAM leverages the frequency domain and MET to mitigate modal misalignment, while bidirectional flow aggregation and temporal fusion mechanisms resolve spatiotemporal misalignment. Experimental results on two large-scale datasets demonstrate that our framework significantly outperforms state-of-the-art RGB-Event semantic segmentation approaches. Our code is available at: https://github.com/zyaocoder/BRENet.

RGB-イベントセマンティックセグメンテーションの再考：新たな双方向モーション強化型イベント表現の提案

Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation

要旨

Support