Submitted by fedegarzar t3_z9vbw7 in MachineLearning
Internal-Diet-514 t1_iykhg3s wrote
Reply to comment by marr75 in [R] Statistical vs Deep Learning forecasting methods by fedegarzar
I think so too, I’m confused why they would need to train for 14 days, from skimming the paper it doesn’t seem like the dataset itself is that large. I bet a DL solution that was parameterized correctly to the problem would outperform the traditional statistical approaches.
marr75 t1_iykwulm wrote
While I agree with your general statement, my gut says a well parameterized/regularized deep learning solution would perform as well as an ensemble of statistical approaches (without the expertise needed to select the statistical approaches) but would be harder to explain/interpret.
TheDrownedKraken t1_iyko6jf wrote
I’m just curious, why do you think that?
Internal-Diet-514 t1_iymjci2 wrote
If a model has more parameters than datapoints in the training set it can quickly just learn the training set resulting in an over-fit model. You don’t always need 16+ attention heads to have the best model for a given dataset. A single self attention layer with one head still has the ability to model more complex relationships among the inputs than something like arima would.
kraegarthegreat t1_iyor5g6 wrote
This is something I have found in my research. I keep seeing people making models with millions of parameters when I am able to achieve 99% of the performance with roughly 1k.
Viewing a single comment thread. View all comments