Viewing a single comment thread. View all comments

AmalgamDragon t1_ja5lz5b wrote

This really comes down to how 'reward' is defined. I think we likely disagree on that definition, with yours being a lot narrower then mine is. For example, during the cooking process, there is usually a point before the meal is done where it 'smells good', which is a reward. There's dopamine release as well, which could be triggered when completing some of the steps (don't know if that's the case or not), but simply observing that a step is complete is rewarding for lots of folks.

> Pure RL will quickly teach you not to touch the burner, but it really struggles with tasks that involve planning or delayed rewards.

Depends on which algorithms you're using, but PPO can handle this quite well.

1