techni_24 t1_j123prp wrote on December 21, 2022 at 2:37 AM

Reply to comment by trajo123 in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

Maybe this is the novice in me showing, but how does minimizing the batch size to 1, effect the model performance? I thought it only effected the speed of training.

trajo123 t1_j13gu3z wrote on December 21, 2022 at 11:48 AM

Reducing the batch size to 1 can allow you to train a bigger model, allowing you to reach a lower loss on the training set. Note that accumulate_grad_batches takes on the meaning of batch_size when the latter is set to 1.