albertzeyer t1_j6ebian wrote on January 29, 2023 at 7:18 PM

Reply to comment by JustOneAvailableName in [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132

It's a bit strange indeed that the GCP or Azure results are not so great. As said, I do actually research on speech recognition, and Google is probably the biggest player in this field, and usually always with the very best results.

My explanation is, they don't really use such good and big models for GCP. Maybe they want to reduce the computational cost as much as possible.

But you also anyway have to be a bit careful in what you compare. Your results might be flawed when your finetuning data is close to your validation set (e.g. similar domain, similar sound conditions). Because in case of GCP, they have very generic models, working for all kinds of domains, all kinds of conditions.