currentscurrents t1_j2uwlrh wrote
Reply to comment by Mental-Swordfish7129 in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo
I think interpretability will help us build better models too. For example, in this paper they deeply analyzed a model trained to do a toy problem - addition mod 113
.
They found that it was actually working by doing a Discrete Fourier Transform to turn the numbers into sinewaves. Sinewaves are great for gradient descent because they're easily differentiable (unlike modular addition on the natural numbers, which is not differentiable), and if you choose the right frequency it'll repeat every 113 numbers. The modular addition algorithm worked by doing a bunch of addition and multiplication operations on these sinewaves, which gave the same result as modular addition.
This lets you answer an important question; why wasn't the network generalizable to other bases other than mod 113
? Well, the frequency of the sinewaves was hardcoded into the network, so it couldn't work for any other bases.
The opens the possibility to do neural network surgery, and change the frequency to work with any base.
Mental-Swordfish7129 t1_j2v20d2 wrote
That's amazing. We probably haven't fully realized the great powers of analysis we have available using Fourier transform and wavelet transform and other similar strategies.
[deleted] t1_j2zn5o5 wrote
I think that's primarily how neural networks do their magic really. It's frequencies and probabilities all the way down
Mental-Swordfish7129 t1_j310xxm wrote
Yes! I'm currently playing around with modifying a Kuramoto model to function as a neural network and it seems very promising.
[deleted] t1_j3152ys wrote
Wellllll that seems cool as hell... Seems like steam punk neuroscience hahaha. I love it!
Viewing a single comment thread. View all comments