Viewing a single comment thread. View all comments

Ok_Construction470 t1_ixlephi wrote

Note that spectrograms are NOT images though; the elements values can be negative but for images it can’t

Having said that, I work in the audio domain and have applied a computer vision transformer, the Shifted window Swin one, to the domain of audio, in particular the spectrograms extracted from the raw waveform

This was the OG paper https://arxiv.org/abs/2202.00874

They used the pretrained model too

2

hadaev t1_ixltjo0 wrote

Just rescale it to -1, 1 like people do for image

1