Submitted by Vegetable-Skill-9700 t3_121a8p4 in MachineLearning
atheist-projector t1_jdm7hw1 wrote
Reply to comment by soggy_mattress in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Especialy when considr that sgd is a local minima we can probably do a whole lot better if we find a niced optimizer.
Viewing a single comment thread. View all comments