Mental-Swordfish7129 t1_j2t17wy wrote on January 3, 2023 at 7:26 PM

Reply to comment by clauwen in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo

Idk much about other encoding systems. This works well for my purposes. It's scalable. I look at my data and ask, "how many binary features of each datum are salient and also which features are important to the model for judging similarities"? 2000 may be too much sometimes. Also, remember that a binary vector is often handled as an integer array indicating the index of bits set to 1. If your vectors are sparse it can be very efficient. For the AI models I build, my vectors are often quite sparse because I often use a scheme like a "slider" of activations for integer data; sort of like "one hot", but I'll do three or more consecutive to encode associativity.