Submitted by zxkj t3_1126g64 in MachineLearning
Red-Portal t1_j8ke7vj wrote
It's literally called importance sampling in the SGD literature. You normally have to downweigh the "important samples" to counter the fact that you're sampling them more often. Whether this practice actually accelerates convergence has been an important question in SGD until very recently. Check this paper.
Viewing a single comment thread. View all comments