Het heroverwegen van RGB-gebeurtenis semantische segmentatie met een nieuwe bidirectionele beweging-verbeterde gebeurtenisrepresentatie

Samenvatting

Event camera's vangen bewegingsdynamiek vast en bieden een unieke modaliteit met groot potentieel in diverse computervisietaken. Echter, RGB-Event fusie kampt met drie intrinsieke uitlijningproblemen: (i) temporele, (ii) ruimtelijke, en (iii) modale uitlijning. Bestaande voxelgridrepresentaties negeren temporele correlaties tussen opeenvolgende eventvensters, en hun formulering met een eenvoudige accumulatie van asynchrone en sparse events is incompatibel met de synchrone en dense aard van de RGB-modaliteit. Om deze uitdagingen aan te pakken, stellen we een nieuwe eventrepresentatie voor, de Motion-enhanced Event Tensor (MET), die sparse eventvoxels transformeert naar een dense en temporeel coherente vorm door gebruik te maken van dense optische stromen en temporele eventkenmerken. Daarnaast introduceren we een Frequency-aware Bidirectional Flow Aggregation Module (BFAM) en een Temporal Fusion Module (TFM). BFAM benut het frequentiedomein en MET om modale uitlijning te verminderen, terwijl bidirectionele stroomaggregatie en temporele fusiemechanismen spatiotemporele uitlijning oplossen. Experimentele resultaten op twee grootschalige datasets tonen aan dat ons framework aanzienlijk beter presteert dan state-of-the-art RGB-Event semantische segmentatiebenaderingen. Onze code is beschikbaar op: https://github.com/zyaocoder/BRENet.

English

Event cameras capture motion dynamics, offering a unique modality with great potential in various computer vision tasks. However, RGB-Event fusion faces three intrinsic misalignments: (i) temporal, (ii) spatial, and (iii) modal misalignment. Existing voxel grid representations neglect temporal correlations between consecutive event windows, and their formulation with simple accumulation of asynchronous and sparse events is incompatible with the synchronous and dense nature of RGB modality. To tackle these challenges, we propose a novel event representation, Motion-enhanced Event Tensor (MET), which transforms sparse event voxels into a dense and temporally coherent form by leveraging dense optical flows and event temporal features. In addition, we introduce a Frequency-aware Bidirectional Flow Aggregation Module (BFAM) and a Temporal Fusion Module (TFM). BFAM leverages the frequency domain and MET to mitigate modal misalignment, while bidirectional flow aggregation and temporal fusion mechanisms resolve spatiotemporal misalignment. Experimental results on two large-scale datasets demonstrate that our framework significantly outperforms state-of-the-art RGB-Event semantic segmentation approaches. Our code is available at: https://github.com/zyaocoder/BRENet.

Het heroverwegen van RGB-gebeurtenis semantische segmentatie met een nieuwe bidirectionele beweging-verbeterde gebeurtenisrepresentatie

Rethinking RGB-Event Semantic Segmentation with a Novel Bidirectional Motion-enhanced Event Representation

Samenvatting

Support