bitcoin_analysis_app t1_ix5g23e wrote
As well as learning policy, the human brain makes use of prediction error, much like self-supervised learning.
The signal from traditional RL (when you don't reframe it as AlexGrinch mentions), is much sparser than simply feeding the entire collection of human writing to a room full of GPU.
Viewing a single comment thread. View all comments