重新思考RGB-事件語義分割:基於新型雙向運動增強事件表徵
Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation
May 2, 2025
作者: Zhen Yao, Xiaowen Ying, Mooi Choo Chuah
cs.AI
摘要
事件相機捕捉運動動態,提供了一種獨特的模態,在各種計算機視覺任務中具有巨大潛力。然而,RGB-事件融合面臨三個固有的不對齊問題:(i) 時間、(ii) 空間和 (iii) 模態不對齊。現有的體素網格表示忽略了連續事件窗口之間的時間相關性,並且其基於異步稀疏事件簡單累積的公式與RGB模態的同步密集特性不相容。為解決這些挑戰,我們提出了一種新穎的事件表示方法——運動增強事件張量(MET),通過利用密集光流和事件時間特徵,將稀疏事件體素轉化為密集且時間連貫的形式。此外,我們引入了頻率感知雙向流聚合模塊(BFAM)和時間融合模塊(TFM)。BFAM利用頻域和MET來緩解模態不對齊,而雙向流聚合和時間融合機制則解決了時空不對齊問題。在兩個大規模數據集上的實驗結果表明,我們的框架顯著優於最先進的RGB-事件語義分割方法。我們的代碼可在以下網址獲取:https://github.com/zyaocoder/BRENet。
English
Event cameras capture motion dynamics, offering a unique modality with great
potential in various computer vision tasks. However, RGB-Event fusion faces
three intrinsic misalignments: (i) temporal, (ii) spatial, and (iii) modal
misalignment. Existing voxel grid representations neglect temporal correlations
between consecutive event windows, and their formulation with simple
accumulation of asynchronous and sparse events is incompatible with the
synchronous and dense nature of RGB modality. To tackle these challenges, we
propose a novel event representation, Motion-enhanced Event Tensor (MET), which
transforms sparse event voxels into a dense and temporally coherent form by
leveraging dense optical flows and event temporal features. In addition, we
introduce a Frequency-aware Bidirectional Flow Aggregation Module (BFAM) and a
Temporal Fusion Module (TFM). BFAM leverages the frequency domain and MET to
mitigate modal misalignment, while bidirectional flow aggregation and temporal
fusion mechanisms resolve spatiotemporal misalignment. Experimental results on
two large-scale datasets demonstrate that our framework significantly
outperforms state-of-the-art RGB-Event semantic segmentation approaches. Our
code is available at: https://github.com/zyaocoder/BRENet.Summary
AI-Generated Summary