Submitted by olmec-akeru t3_z6p4yv in MachineLearning
Deep-Station-1746 t1_iy2k52e wrote
It fully depends on assumptions. Assumptions about the data, and the model. Without assumptions, you can't do dim reduction.
So, what are you willing to assume? e.g. Assuming you have vast quantities of text data, arguably the current best dim reducers are generative transformers.
JamesBaxter_Horse t1_iy2zcev wrote
No free lunch theorem
koiRitwikHai t1_iy48qvn wrote
can you tell us what are the assumptions for PCA, tsne, VAE and Umap?
SleekEagle t1_iy4k5m4 wrote
It's been a while since I looked at tsne and umap but the assumption for PCA is that the data lives near an affine subspace and for VAE that the data is well modeled by the distribution whose parameters you are finding. My thoughts but I'm sure there's other considerations that I'd love to hear other people chime in with!
manojs t1_iy4jrt6 wrote
If an expert on the topic can respond to this, that would be awesome.
olmec-akeru OP t1_iy2pewq wrote
Awesome answer: for a set of assumptions, what would you use?
I've seen some novel approaches on arXiv on categorical variables; but can't seem to shake the older deep-learning methods for continuous variables.
NonOptimized t1_iy2qybl wrote
>Awesome answer: for a set of assumptions, what would you use?
>
>I've seen some novel approaches on arXiv on categorical variables; but can't seem to shake the older deep-learning methods for continuous variables.
Could you link some of these novel approaches? I'm quite curious what they could be?
olmec-akeru OP t1_iy2zjoi wrote
https://arxiv.org/pdf/2204.04273.pdf
https://arxiv.org/pdf/2203.09347.pdf
https://arxiv.org/pdf/2206.06513.pdf
and the one speaking to categorical variables: https://arxiv.org/pdf/2112.00362.pdf
BrisklyBrusque t1_iy3s0ha wrote
Cool links. I’ll add “entity embeddings” into the mix. Entity embeddings reimagine a categorical variable as a continuous-valued vector and allow us to skip one-hot encoding.
olmec-akeru OP t1_iy7a1yc wrote
I fear that the location in the domain creates a false relationship to those closer on the same domain
i.e. if you encode at 0.1, 0.2, …, 0.9 you're saying that the category encoded to 0.2 is more similar to 0.1 and 0.3 than it is to 0.9. This may not be true.
BrisklyBrusque t1_iy8wfoa wrote
I freely admit I haven’t looked into the math. But my understanding was the embeddings are a learned representation. They are not arbitrary; instead they aim to put categories close to one another on a continuous scale only in those situations where it is justified.
NonOptimized t1_iy32aw7 wrote
Neat, thanks, will give them a read!
Honest-Debate-6863 t1_iy4xe0x wrote
Where do y’all work at? How are you guys so knowledgeable
Viewing a single comment thread. View all comments