Submitted by EmbarrassedFuel t3_10w5f9u in MachineLearning
I have a problem I need to solve that, as far as I can tell, doesn't fit very well into most of the existing RL literature.
Essentially the task is to create on optimal plan over a time horizon extending a flexible number of steps into the future. The action space is both discrete and continuous - there are multiple available distinct actions, some of which need to be given continuous (but constrained) parameters.
In this problem however, the state of the environment is known ahead of time for all the future time steps, and the updated state of the agent after each action can be calculated deterministically given the action and the environment state.
Modelling the entire problem as a MILP is not feasible due to the size of the action and state space, and we have a very large data set for agent and environment state to play with. Does anyone have any suggestions for papers or models that might be appropriate for this scenario?
blackhole077 t1_j7l4yc9 wrote
Perhaps the Semi-Markov Decision Process Paper by Sutton would be a good start
This should give you the paper: http://www-anw.cs.umass.edu/~barto/courses/cs687/Sutton-Precup-Singh-AIJ99.pdf
It sounds like you're looking for "options" in reinforcement learning, so any papers that cover that idea may be of interest to you.