Viewing a single comment thread. View all comments

mtocrat t1_j4zecpm wrote

What you're describing is a general approach to RL that is used in different forms in many methods: sample actions, weight or rank them in some way by the estimated return, regress to the weighted actions. So you're not suggesting to do something other than RL but to replace one RL approach with a different RL approach.

2