Viewing a single comment thread. View all comments

Mental-Swordfish7129 t1_j2t17wy wrote

Idk much about other encoding systems. This works well for my purposes. It's scalable. I look at my data and ask, "how many binary features of each datum are salient and also which features are important to the model for judging similarities"? 2000 may be too much sometimes. Also, remember that a binary vector is often handled as an integer array indicating the index of bits set to 1. If your vectors are sparse it can be very efficient. For the AI models I build, my vectors are often quite sparse because I often use a scheme like a "slider" of activations for integer data; sort of like "one hot", but I'll do three or more consecutive to encode associativity.

10