重新思考RGB-事件语义分割:一种新颖的双向运动增强事件表示方法
Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation
May 2, 2025
作者: Zhen Yao, Xiaowen Ying, Mooi Choo Chuah
cs.AI
摘要
事件相机捕捉运动动态,为多种计算机视觉任务提供了一种独特且极具潜力的模态。然而,RGB与事件数据的融合面临三个本质上的不对齐问题:(i) 时间、(ii) 空间以及 (iii) 模态不对齐。现有的体素网格表示方法忽视了连续事件窗口间的时间关联性,且其基于异步稀疏事件简单累积的构建方式与RGB模态的同步密集特性不相兼容。为应对这些挑战,我们提出了一种新颖的事件表示方法——运动增强事件张量(MET),它通过利用密集光流和事件时间特征,将稀疏事件体素转化为密集且时间连贯的形式。此外,我们引入了频率感知双向流聚合模块(BFAM)和时间融合模块(TFM)。BFAM结合频域信息和MET来缓解模态不对齐,而双向流聚合与时间融合机制则解决了时空不对齐问题。在两个大规模数据集上的实验结果表明,我们的框架在RGB-事件语义分割任务上显著超越了现有最先进的方法。代码已公开于:https://github.com/zyaocoder/BRENet。
English
Event cameras capture motion dynamics, offering a unique modality with great
potential in various computer vision tasks. However, RGB-Event fusion faces
three intrinsic misalignments: (i) temporal, (ii) spatial, and (iii) modal
misalignment. Existing voxel grid representations neglect temporal correlations
between consecutive event windows, and their formulation with simple
accumulation of asynchronous and sparse events is incompatible with the
synchronous and dense nature of RGB modality. To tackle these challenges, we
propose a novel event representation, Motion-enhanced Event Tensor (MET), which
transforms sparse event voxels into a dense and temporally coherent form by
leveraging dense optical flows and event temporal features. In addition, we
introduce a Frequency-aware Bidirectional Flow Aggregation Module (BFAM) and a
Temporal Fusion Module (TFM). BFAM leverages the frequency domain and MET to
mitigate modal misalignment, while bidirectional flow aggregation and temporal
fusion mechanisms resolve spatiotemporal misalignment. Experimental results on
two large-scale datasets demonstrate that our framework significantly
outperforms state-of-the-art RGB-Event semantic segmentation approaches. Our
code is available at: https://github.com/zyaocoder/BRENet.Summary
AI-Generated Summary