重新思考RGB-事件语义分割：一种新颖的双向运动增强事件表示方法

摘要

事件相机捕捉运动动态，为多种计算机视觉任务提供了一种独特且极具潜力的模态。然而，RGB与事件数据的融合面临三个本质上的不对齐问题：(i) 时间、(ii) 空间以及 (iii) 模态不对齐。现有的体素网格表示方法忽视了连续事件窗口间的时间关联性，且其基于异步稀疏事件简单累积的构建方式与RGB模态的同步密集特性不相兼容。为应对这些挑战，我们提出了一种新颖的事件表示方法——运动增强事件张量（MET），它通过利用密集光流和事件时间特征，将稀疏事件体素转化为密集且时间连贯的形式。此外，我们引入了频率感知双向流聚合模块（BFAM）和时间融合模块（TFM）。BFAM结合频域信息和MET来缓解模态不对齐，而双向流聚合与时间融合机制则解决了时空不对齐问题。在两个大规模数据集上的实验结果表明，我们的框架在RGB-事件语义分割任务上显著超越了现有最先进的方法。代码已公开于：https://github.com/zyaocoder/BRENet。

English

Event cameras capture motion dynamics, offering a unique modality with great potential in various computer vision tasks. However, RGB-Event fusion faces three intrinsic misalignments: (i) temporal, (ii) spatial, and (iii) modal misalignment. Existing voxel grid representations neglect temporal correlations between consecutive event windows, and their formulation with simple accumulation of asynchronous and sparse events is incompatible with the synchronous and dense nature of RGB modality. To tackle these challenges, we propose a novel event representation, Motion-enhanced Event Tensor (MET), which transforms sparse event voxels into a dense and temporally coherent form by leveraging dense optical flows and event temporal features. In addition, we introduce a Frequency-aware Bidirectional Flow Aggregation Module (BFAM) and a Temporal Fusion Module (TFM). BFAM leverages the frequency domain and MET to mitigate modal misalignment, while bidirectional flow aggregation and temporal fusion mechanisms resolve spatiotemporal misalignment. Experimental results on two large-scale datasets demonstrate that our framework significantly outperforms state-of-the-art RGB-Event semantic segmentation approaches. Our code is available at: https://github.com/zyaocoder/BRENet.

重新思考RGB-事件语义分割：一种新颖的双向运动增强事件表示方法

Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation

摘要

Support