EmbarrassedFuel OP t1_j7p519o wrote on February 8, 2023 at 12:24 PM

Reply to comment by jimmymvp in Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel

Basically given some predicted environment state, going forward for say 100 time steps, we need to find an optimal cost course of action. Although the environment state has been predicted, for the purposes of this task the agent can consider it deterministic. The agent has one variable of internal state and can take actions to increase or decrease this value based on interactions with the environment. We can then calculate the new cost over the given time horizon by simulating the actions chosen at each step, but this simulation is fundamentally sequential and wouldn't allow backpropagation of gradients.

>you can go with sampling approaches

What exactly do you mean by this? something like REINFORCE?

> I guess it is if you're using a MILP approach.

Not sure I follow here, but I'm not using a MILP (as in mixed integer linear program). At the moment I'm using a linear programming approximation and heuristics, which doesn't generalize well.

> some combination of MCTS with value function learning

I think this could work, however without looking into it I'm not sure that it would work at inference time in my resource-constrained setting

EmbarrassedFuel OP t1_j7p40eo wrote on February 8, 2023 at 12:13 PM

Reply to comment by UnusualClimberBear in Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel

oh also the model needs to run at inference time in a relatively short period of time on cheap hardware :)

EmbarrassedFuel OP t1_j7p3xc1 wrote on February 8, 2023 at 12:13 PM

Reply to comment by UnusualClimberBear in Model/paper ideas: reinforcement learning with a deterministic environment [D] by EmbarrassedFuel

I haven't been able to find anything about optimal control with all of:

non-linear dynamics/model
non-linear constraints
both discrete and continuously parameterized actions in the output space

but in general, discovery of papers/techniques in control theory seems to be much harder for some reason