Viewing a single comment thread. View all comments

Red-Portal t1_j8ke7vj wrote on February 14, 2023 at 11:11 PM

It's literally called importance sampling in the SGD literature. You normally have to downweigh the "important samples" to counter the fact that you're sampling them more often. Whether this practice actually accelerates convergence has been an important question in SGD until very recently. Check this paper.