bernhard-lehner t1_ir43bht wrote
Reply to comment by IdentifiableParam in [R] Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging by rlresearcher
Yeah, thats hardly a novel approach...but I have to admit that I also could spend more time looking if anyone else have had the same idea I'm trying at the moment. We really need "Schmidhuber as a Service" :)
jeankaddour t1_ir4vzq6 wrote
Hi, the author here. Thank you for your comment.
My goal with the paper was not to present weight averaging as a novel approach; rather, to study the empirical convergence speed-ups in more detail.
Please have a look at the related work section where I discuss previous works using weight averaging, and feel free to let me know if I missed one that focuses on speedups.
Viewing a single comment thread. View all comments