Submitted by Oceanboi t3_z30bf2 in MachineLearning
Ok_Construction470 t1_ixlephi wrote
Note that spectrograms are NOT images though; the elements values can be negative but for images it can’t
Having said that, I work in the audio domain and have applied a computer vision transformer, the Shifted window Swin one, to the domain of audio, in particular the spectrograms extracted from the raw waveform
This was the OG paper https://arxiv.org/abs/2202.00874
They used the pretrained model too
hadaev t1_ixltjo0 wrote
Just rescale it to -1, 1 like people do for image
Viewing a single comment thread. View all comments