Submitted by kaphed t3_124pbq5 in MachineLearning
U03B1Q t1_je4xekj wrote
https://dl.acm.org/doi/10.1145/3324884.3416545
There was an ASE paper that found that even under identical hyperparameter seed settings networks had a variance of about 2% due to non-determinism in the parallel computing workflow. If they chose to retrain it instead of copying the old numbers, this performance discrepancy is in line with this work.
Viewing a single comment thread. View all comments