在人工智能中融入脑启发的多模态学习机制
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
May 15, 2025
作者: Xiang He, Dongcheng Zhao, Yang Li, Qingqun Kong, Xin Yang, Yi Zeng
cs.AI
摘要
多模态学习通过整合来自不同感知模态的信息,增强了认知系统的感知能力。然而,现有的多模态融合研究通常假设静态整合,未能充分纳入大脑中发现的关键动态机制。具体而言,大脑表现出一种反向效应现象,即较弱的单模态线索会产生更强的多感官整合效益;反之,当单个模态线索较强时,融合效果则减弱。这一机制使得生物系统即使在感知线索稀缺或嘈杂的情况下,也能实现稳健的认知。受此生物机制的启发,我们探索了多模态输出与各模态信息之间的关系,提出了一种基于反向效应的多模态融合(IEMF)策略。通过将这一策略融入神经网络,我们实现了更高效的整合,提升了模型性能和计算效率,在多种融合方法中展示了高达50%的计算成本降低。我们在视听分类、持续学习和问答任务上进行了实验以验证我们的方法。结果一致表明,我们的方法在这些任务中表现优异。为验证普适性和泛化能力,我们还在人工神经网络(ANN)和脉冲神经网络(SNN)上进行了实验,结果显示该方法对两种网络类型均具有良好的适应性。我们的研究强调了将生物启发机制融入多模态网络的潜力,并为多模态人工智能的未来发展提供了有前景的方向。代码可在https://github.com/Brain-Cog-Lab/IEMF获取。
English
Multimodal learning enhances the perceptual capabilities of cognitive systems
by integrating information from different sensory modalities. However, existing
multimodal fusion research typically assumes static integration, not fully
incorporating key dynamic mechanisms found in the brain. Specifically, the
brain exhibits an inverse effectiveness phenomenon, wherein weaker unimodal
cues yield stronger multisensory integration benefits; conversely, when
individual modal cues are stronger, the effect of fusion is diminished. This
mechanism enables biological systems to achieve robust cognition even with
scarce or noisy perceptual cues. Inspired by this biological mechanism, we
explore the relationship between multimodal output and information from
individual modalities, proposing an inverse effectiveness driven multimodal
fusion (IEMF) strategy. By incorporating this strategy into neural networks, we
achieve more efficient integration with improved model performance and
computational efficiency, demonstrating up to 50% reduction in computational
cost across diverse fusion methods. We conduct experiments on audio-visual
classification, continual learning, and question answering tasks to validate
our method. Results consistently demonstrate that our method performs
excellently in these tasks. To verify universality and generalization, we also
conduct experiments on Artificial Neural Networks (ANN) and Spiking Neural
Networks (SNN), with results showing good adaptability to both network types.
Our research emphasizes the potential of incorporating biologically inspired
mechanisms into multimodal networks and provides promising directions for the
future development of multimodal artificial intelligence. The code is available
at https://github.com/Brain-Cog-Lab/IEMF.Summary
AI-Generated Summary