JustOneAvailableName t1_j1096lz wrote
Reply to comment by Dartagnjan in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan
Sounds like you need a higher batch size. What happens on a plateaued model on the hard examples when you take a huge batch size?
Viewing a single comment thread. View all comments