Submitted by Fine-Topic-6127 t3_119ydqv in MachineLearning
A simple question really but one that's pretty difficult to find an answer to:
​
Has anyone done much research into the performance of models vs their size as a function of the output space (and if so where can I find it)? Basically, it's quite clear that for most applications, generalisability of a model can either be achieved by improving the dataset or increasing the size of the model (if your dataset is already good). But because the way performance is measured in SOTA benchmarks it's not necessarily obvious (to me at least) that these larger models are appropriate for more simple problems.
​
Say I have a simple audio classification problem where I only have one class of interest. If I wanted to implement the latest SOTA models in sound classification I'm likely to end up trying to use some pretty large and complicated model architectures. What I would like to know is how does one use SOTA benchmarks to inform their decisions for architectures in the face of tasks that are significantly simpler than those that are used to evaluate the performance of models on these benchmarks?
​
It feels like the simple answer is to just start simple and scale up as required but this does feel somewhat like trial and error so it would be great to hear how other people approach this sort of problem...
floppy_llama t1_j9opzwx wrote
Unfortunately a lot of ML is just trial and error