Submitted by TheWittyScreenName t3_11s1zfh in MachineLearning
I've found that by dramatically lowering the LR and increasing the number of epochs, very simple, baseline models can outperform SoTA models which use far more parameters. Is this considered "cheating" when comparing models? Is this something interesting enough to warrant a short paper? I'm not sure what to do with this information.
For example, in the original VGAE paper, when training a GAE, they use a LR of 0.01, and train for 200 epochs to get 0.91 AUC, 0.92 AP on a link prediction experiment. Rerunning the same experiment with a LR of 5e-5 for 1500 epochs gets 0.97 AUC, 0.97 AP which is better than the current leader on papers with code for this dataset.
It needs more epochs but has way, way fewer parameters than SoTA models, is this a valid trade-off? Is this even a fair comparison?
CatalyzeX_code_bot t1_jcbcny4 wrote
Found relevant code at https://github.com/tkipf/gae + all code implementations here
--
To opt out from receiving code links, DM me