make3333
make3333 t1_ivroe1x wrote
Reply to comment by Difficult_Ferret2838 in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
gradient descent takes the direction of the minimum at the step size according to the taylor series of degree n at that point. in neural nets we do first degree, as if it was a plane. in a lot of other optimization settings they do second order approx to find the optimal direction
make3333 t1_ivqfpxu wrote
Reply to comment by uncooked-cookie in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
first degree optimal direction
make3333 t1_itw03lm wrote
Reply to comment by EnvironmentalBar338 in [D]Cheating in AAAI 2023 rebuttal by [deleted]
you definitely should contact people higher
make3333 t1_j47zeza wrote
Reply to comment by chimp73 in [D] Bitter lesson 2.0? by Tea_Pearce
& often don't even need to fine tune because of instruction pre training and few shot prompting