derpderp3200 OP t1_j1vmr20 wrote on December 27, 2022 at 7:07 PM

Reply to comment by eigenham in [D] Has any research been done to counteract the fact that each training datapoint "pulls the model in a different direction", partly undoing learning until shared features emerge? by derpderp3200

Similar but not identical? What effect do you mean?

But yeah, the way I see it, the network isn't navigating a single gradient towards "a good classifier" optima, but rather down whatever gradient is left after the otherwise-destructive inference of gradients of individual training examples, as opposed to a more "purposeful" extraction of features.

Which happens to result in a gradual movement towards being a decent classifier, but it strictly relies on balanced, large, and well-crafted datasets to balance the "pull vectors" out to "zero" so the convergence effect dominates, as well as incredibly high training costs.

I don't know how it would look, but surely a more "cooperative" learning process would learn faster if not better.