Multimodal Spiking-Mixer with robustness-improved ODE-neuron
- Propose the first Vision-Language multi-modal Spiking Neural Network for Image-Caption application
- Demonstrate to view SNN as Neural-ODE and by analyzing the stability of the ODE to gain adverserial robustness
- Extend multimodal Vison-Language aversarial attack to SNN domain and demonstrate the effectiveness compared to
naive unimodal implementation