Viewing a single comment thread. View all comments

albertzeyer t1_j6ebian wrote

It's a bit strange indeed that the GCP or Azure results are not so great. As said, I do actually research on speech recognition, and Google is probably the biggest player in this field, and usually always with the very best results.

My explanation is, they don't really use such good and big models for GCP. Maybe they want to reduce the computational cost as much as possible.

But you also anyway have to be a bit careful in what you compare. Your results might be flawed when your finetuning data is close to your validation set (e.g. similar domain, similar sound conditions). Because in case of GCP, they have very generic models, working for all kinds of domains, all kinds of conditions.

1