Red-Portal t1_j1vu94s wrote
I think what you're describing is similar to curriculum learning and importance sampling SGD. The former claims that there is a better order of feeding data during SGD that results in better training. But I'm not sure how scientifically grounded that line of research has become. It used to be closer to art. The latter is simple. Since some samples are more "destructive" (higher variance), sample them less often while numerically compensating for that.
Viewing a single comment thread. View all comments