Viewing a single comment thread. View all comments

Red-Portal t1_j8ke7vj wrote

It's literally called importance sampling in the SGD literature. You normally have to downweigh the "important samples" to counter the fact that you're sampling them more often. Whether this practice actually accelerates convergence has been an important question in SGD until very recently. Check this paper.

3