techni_24 t1_j123prp wrote
Reply to comment by trajo123 in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan
Maybe this is the novice in me showing, but how does minimizing the batch size to 1, effect the model performance? I thought it only effected the speed of training.
trajo123 t1_j13gu3z wrote
Reducing the batch size to 1 can allow you to train a bigger model, allowing you to reach a lower loss on the training set. Note that accumulate_grad_batches takes on the meaning of batch_size when the latter is set to 1.
Viewing a single comment thread. View all comments