I've only just started experimenting but my approach so far has been to discretise the observation into Q table sizes varying between [8, 8, 8, 8, 8, 8, 2, 2, action_space.n] and [20, 20, 20, 20, 20, 20, 2, 2, action_space.n]. And 5k-10k episodes, learning rate of 0.1 and discount of 0.99.
The results will not win any SpaceX contracts just yet but they do result in soft-ish landings between the flags more often than not.
I found hovering to be a problem so added some handling to exit the episode after around 500 steps.
At this point, I normally start looking at what others have done, and was surprised not to see more examples demonstrating tabular Q learning in this scenario (despite the issues with the continuous observation space).
Will look at deep RL next but found it interesting to try the tabular approach first.
verbigratia OP t1_j1cxswc wrote
Reply to [D] Non-deep Q learning with OpenAI gym lunar lander - anyone? by verbigratia
Thanks all.
I've only just started experimenting but my approach so far has been to discretise the observation into Q table sizes varying between [8, 8, 8, 8, 8, 8, 2, 2, action_space.n] and [20, 20, 20, 20, 20, 20, 2, 2, action_space.n]. And 5k-10k episodes, learning rate of 0.1 and discount of 0.99.
The results will not win any SpaceX contracts just yet but they do result in soft-ish landings between the flags more often than not.
I found hovering to be a problem so added some handling to exit the episode after around 500 steps.
At this point, I normally start looking at what others have done, and was surprised not to see more examples demonstrating tabular Q learning in this scenario (despite the issues with the continuous observation space).
Will look at deep RL next but found it interesting to try the tabular approach first.
Edit: grammar