Viewing a single comment thread. View all comments

RegularBasicStranger t1_iy3xlyx wrote

Maybe the AI can play with variants of itself, the variants having different starting values so it is as if the variant already had played against some people halfway and taken out from the game, superimposing the people on the other copies of itself.

So there will be many different combinations of variants to train on and improve, not needing actual data to train on.

So if new rules are added, playing just one game with the new rules and accepting every move that the people made is best until proven false, will allow multiple variants of the AI to be created by just having the AI's beliefs of the players be from different times of the game.

So it will be an AI at time 0, the same AI at time 10, at time 20... so multiple variants to train against and improve, without needing costly actual data.

1