Submitted by verbigratia t3_zsvsic in MachineLearning
verbigratia OP t1_j1cxswc wrote
Thanks all.
I've only just started experimenting but my approach so far has been to discretise the observation into Q table sizes varying between [8, 8, 8, 8, 8, 8, 2, 2, action_space.n] and [20, 20, 20, 20, 20, 20, 2, 2, action_space.n]. And 5k-10k episodes, learning rate of 0.1 and discount of 0.99.
The results will not win any SpaceX contracts just yet but they do result in soft-ish landings between the flags more often than not.
I found hovering to be a problem so added some handling to exit the episode after around 500 steps.
At this point, I normally start looking at what others have done, and was surprised not to see more examples demonstrating tabular Q learning in this scenario (despite the issues with the continuous observation space).
Will look at deep RL next but found it interesting to try the tabular approach first.
Edit: grammar
Viewing a single comment thread. View all comments