Viewing a single comment thread. View all comments

bitcoin_analysis_app t1_ix5g23e wrote

As well as learning policy, the human brain makes use of prediction error, much like self-supervised learning.

The signal from traditional RL (when you don't reframe it as AlexGrinch mentions), is much sparser than simply feeding the entire collection of human writing to a room full of GPU.

2