ChatPaper.aiChatPaper

在人工智慧中融入腦啟發機制以實現多模態學習

Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence

May 15, 2025
作者: Xiang He, Dongcheng Zhao, Yang Li, Qingqun Kong, Xin Yang, Yi Zeng
cs.AI

摘要

多模態學習通過整合來自不同感官模態的信息,增強了認知系統的感知能力。然而,現有的多模態融合研究通常假設靜態整合,未能充分融入大腦中的關鍵動態機制。具體而言,大腦表現出一種反向效應現象,即較弱的單模態線索會產生更強的多感官整合效益;反之,當個別模態線索較強時,融合的效果則會減弱。這一機制使生物系統即使在感知線索稀缺或噪聲較大的情況下,仍能實現穩健的認知。受此生物機制的啟發,我們探討了多模態輸出與單一模態信息之間的關係,提出了一種基於反向效應驅動的多模態融合(IEMF)策略。通過將這一策略融入神經網絡,我們實現了更高效的整合,提升了模型性能和計算效率,在多種融合方法中展示了高達50%的計算成本降低。我們在視聽分類、持續學習和問答任務上進行了實驗,以驗證我們的方法。結果一致表明,我們的方法在這些任務中表現出色。為驗證通用性和泛化能力,我們還在人工神經網絡(ANN)和脈衝神經網絡(SNN)上進行了實驗,結果顯示我們的方法對這兩種網絡類型均具有良好的適應性。我們的研究強調了將生物啟發機制融入多模態網絡的潛力,並為多模態人工智能的未來發展提供了有前景的方向。代碼可在https://github.com/Brain-Cog-Lab/IEMF獲取。
English
Multimodal learning enhances the perceptual capabilities of cognitive systems by integrating information from different sensory modalities. However, existing multimodal fusion research typically assumes static integration, not fully incorporating key dynamic mechanisms found in the brain. Specifically, the brain exhibits an inverse effectiveness phenomenon, wherein weaker unimodal cues yield stronger multisensory integration benefits; conversely, when individual modal cues are stronger, the effect of fusion is diminished. This mechanism enables biological systems to achieve robust cognition even with scarce or noisy perceptual cues. Inspired by this biological mechanism, we explore the relationship between multimodal output and information from individual modalities, proposing an inverse effectiveness driven multimodal fusion (IEMF) strategy. By incorporating this strategy into neural networks, we achieve more efficient integration with improved model performance and computational efficiency, demonstrating up to 50% reduction in computational cost across diverse fusion methods. We conduct experiments on audio-visual classification, continual learning, and question answering tasks to validate our method. Results consistently demonstrate that our method performs excellently in these tasks. To verify universality and generalization, we also conduct experiments on Artificial Neural Networks (ANN) and Spiking Neural Networks (SNN), with results showing good adaptability to both network types. Our research emphasizes the potential of incorporating biologically inspired mechanisms into multimodal networks and provides promising directions for the future development of multimodal artificial intelligence. The code is available at https://github.com/Brain-Cog-Lab/IEMF.

Summary

AI-Generated Summary

PDF31May 21, 2025