deathisnear t1_j6sflmh wrote
Why not use standard search algorithms for this? Why do you need to use reinforcement learning?
[deleted] t1_j6sgntv wrote
[deleted]
jtpaquet OP t1_j6sgfhj wrote
I don't think I understand what you mean by search algorithm. The part where I generate the map is already done. I want to do the part where the character chooses if he heals, flee or attack depending on various situation so I thought RL was good for this.
deathisnear t1_j6shet8 wrote
You can accomplish this with https://en.m.wikipedia.org/wiki/Minimax
jtpaquet OP t1_j6srph6 wrote
Ok so basically, I can understand the search algorithm, it is evaluating all possible move, but in the case of the game, there would be a lot of possibilities to evaluate.Movement: if I can move 7 squares, I have 113 possibilities (in a map with no walls).movement far from enemy: 113/4 ~ 29movement but I stay at the same distance from enemy: 113/2 ~ 56movement closer to the enemy: 113/4 ~ 28Attack: I'll count average roll because damage is a bit random in this game. I would say I have 10 average rolls I could do on the enemy. If I'm closer I have more possibilities to attack.From afar: 3 ways of attacking.From medium distance: 5 ways of attackingFrom close range: 10 ways of attackingAn average game would have 5 turns for each player, so 10 turns.Game outcomes = (movement_far x 3 + move_medium x 5 + move_close x 10)^10 = 1.28 x 10^28 possibilities
Let's eleminate some dumb cases: I'll keep 4 moves where I stay at a distance, 10 moves where I stay at medium range and 10 moves where I stay at close range.Game outcomes = (4 x 3 + 10 x 5 + 10 x 10)^10 = 1.25 x 10^22 possibilities.
Trying to find the state only 4 turns forward would lead to ~700 million possiblities to evaluate.
Even with pruning, considering this is the base case, and that there is a lot more ways to move on the map because different characters have different spells, I don't know if it would be suitable. It seemed to me that RL was the best alternative because it could potentially see if there is walls near the enemy and optimize based on details like that over time.
However, I didn't give full context on the game so my bad. The map would also change and the board is maximum 32x32 but typically 16x16. Let me know if there is something I don't understand or if the high number of possibilities is not a problem.
deathisnear t1_j6swwf9 wrote
You can implement alpha beta pruning to further reduce the number of actions evaluated or look at Monte Carlo tree search as a potential option in terms of scalability (this combined with deep learning was used for Alpha Go). Games such as Fire Emblem have a similar setup and they definitely are not using RL for such a case and they tend to have reasonable performance.
jtpaquet OP t1_j6tzmhw wrote
Ok thanks I’ll look into it, I was thinking maybe to do minimax as a base for RL so it has already a starting point to improve. I considered checking every possible option at first but ruled it out since I thought there would be too many outcomes. Pruning seems to reduce the number of outcomes so that might be possible after all. Thanks for making me see the problem in an other way!
Viewing a single comment thread. View all comments