Viewing a single comment thread. View all comments

U03B1Q t1_je4xekj wrote

https://dl.acm.org/doi/10.1145/3324884.3416545

There was an ASE paper that found that even under identical hyperparameter seed settings networks had a variance of about 2% due to non-determinism in the parallel computing workflow. If they chose to retrain it instead of copying the old numbers, this performance discrepancy is in line with this work.

1