인공지능에서 다중 모달 학습을 위한 뇌 영감 메커니즘 통합

초록

멀티모달 학습은 다양한 감각 양식에서의 정보를 통합함으로써 인지 시스템의 지각 능력을 향상시킵니다. 그러나 기존의 멀티모달 융합 연구는 일반적으로 정적 통합을 가정하며, 뇌에서 발견되는 주요 동적 메커니즘을 완전히 통합하지 못하고 있습니다. 특히, 뇌는 역효과 현상을 보이는데, 이는 약한 단일 양식 단서가 더 강력한 다감각 통합 이점을 가져오는 반면, 개별 양식 단서가 강할 때는 융합 효과가 감소하는 현상을 말합니다. 이 메커니즘은 생물학적 시스템이 부족하거나 노이즈가 있는 지각 단서에서도 견고한 인지를 달성할 수 있게 합니다. 이러한 생물학적 메커니즘에서 영감을 받아, 우리는 멀티모달 출력과 개별 양식 정보 간의 관계를 탐구하고, 역효과 기반 멀티모달 융합(IEMF) 전략을 제안합니다. 이 전략을 신경망에 통합함으로써, 우리는 모델 성능과 계산 효율성이 개선된 더 효율적인 통합을 달성하며, 다양한 융합 방법에서 최대 50%의 계산 비용 감소를 보여줍니다. 우리는 오디오-비주얼 분류, 지속 학습, 질의응답 작업에 대한 실험을 통해 우리의 방법을 검증합니다. 결과는 우리의 방법이 이러한 작업에서 우수한 성능을 보임을 일관되게 입증합니다. 보편성과 일반화를 검증하기 위해, 우리는 또한 인공 신경망(ANN)과 스파이킹 신경망(SNN)에 대한 실험을 수행하며, 결과는 두 네트워크 유형에 대해 좋은 적응성을 보여줍니다. 우리의 연구는 생물학적으로 영감을 받은 메커니즘을 멀티모달 네트워크에 통합하는 잠재력을 강조하며, 멀티모달 인공지능의 미래 발전을 위한 유망한 방향을 제시합니다. 코드는 https://github.com/Brain-Cog-Lab/IEMF에서 확인할 수 있습니다.

English

Multimodal learning enhances the perceptual capabilities of cognitive systems by integrating information from different sensory modalities. However, existing multimodal fusion research typically assumes static integration, not fully incorporating key dynamic mechanisms found in the brain. Specifically, the brain exhibits an inverse effectiveness phenomenon, wherein weaker unimodal cues yield stronger multisensory integration benefits; conversely, when individual modal cues are stronger, the effect of fusion is diminished. This mechanism enables biological systems to achieve robust cognition even with scarce or noisy perceptual cues. Inspired by this biological mechanism, we explore the relationship between multimodal output and information from individual modalities, proposing an inverse effectiveness driven multimodal fusion (IEMF) strategy. By incorporating this strategy into neural networks, we achieve more efficient integration with improved model performance and computational efficiency, demonstrating up to 50% reduction in computational cost across diverse fusion methods. We conduct experiments on audio-visual classification, continual learning, and question answering tasks to validate our method. Results consistently demonstrate that our method performs excellently in these tasks. To verify universality and generalization, we also conduct experiments on Artificial Neural Networks (ANN) and Spiking Neural Networks (SNN), with results showing good adaptability to both network types. Our research emphasizes the potential of incorporating biologically inspired mechanisms into multimodal networks and provides promising directions for the future development of multimodal artificial intelligence. The code is available at https://github.com/Brain-Cog-Lab/IEMF.

인공지능에서 다중 모달 학습을 위한 뇌 영감 메커니즘 통합

Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence

초록

Support