Mental-Swordfish7129 t1_j2x3juw wrote
Reply to comment by maizeq in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo
Idk if it's in the literature. At this point, I can't tell what I've read from what has occurred to me.
I keep track of the error each layer generates and also a brief history of its descending predictions. Then, I simply reinforce the generation of predictions which favor the highest rate of reduction in subsequent error. I think this amounts to a modulation of attention (manifested as a pattern of bit masking of the ascending error signal) which amounts to ignoring the portions of the signal which have low information and high variance.
At the bottom layer, this is implemented as choosing behaviors (moving a reticle over an image u,d,l,r) which accomplish the same avoidance of high variance and thus high noise, but seeking high information gain.
The end result is a reticle which behaves like a curious agent attempting to track new, interesting things and study them a moment before getting bored.
The highest layers seem to be forming composite abstractions on what is happening below, but I have yet to try to understand.
I'm fine with questions.
Viewing a single comment thread. View all comments