Ok_Construction470 t1_ixlephi wrote on November 24, 2022 at 8:55 AM

Note that spectrograms are NOT images though; the elements values can be negative but for images it can’t

Having said that, I work in the audio domain and have applied a computer vision transformer, the Shifted window Swin one, to the domain of audio, in particular the spectrograms extracted from the raw waveform