Viewing a single comment thread. View all comments

trajo123 t1_j13gu3z wrote on December 21, 2022 at 11:48 AM

Reply to comment by techni_24 in [D] Techniques to optimize a model when the loss over the training dataset has a Power Law type curve. by Dartagnjan

Reducing the batch size to 1 can allow you to train a bigger model, allowing you to reach a lower loss on the training set. Note that accumulate_grad_batches takes on the meaning of batch_size when the latter is set to 1.