Submitted by hopedallas t3_zmaobm in MachineLearning
biophysninja t1_j0a4nc2 wrote
There are a few ways to approach this depending on the nature of the data, complexity, and compute available.
1- using SMOTE https://towardsdatascience.com/stop-using-smote-to-handle-all-your-imbalanced-data-34403399d3be
2- if your data is sparse you can use PCA or Autoencoders to reduce the dimensionality. Then follow up with SMOTE.
3- Using GANs to generate negatives samples is another alternative.
Far-Butterscotch-436 t1_j0a8ny2 wrote
Regarding 2, there are only 500 features, dimension reduction not needed.
1 and 3 are last resorts
shaner92 t1_j0amnbc wrote
- Has anyone ever seen SMOTE give good results in real world data??
- Depends what the 500 features are, you could very well benefit from dimension reduction, or at least pruning some features, if they are not all equally useful. That is a separate topic though
- Lot of work to create fake data when he already has that amount
Playing with the loss functions/metrics is probably the best way to go as you ( u/Far-Butterscotch-436 ) pointed out.
daavidreddit69 t1_j0b5292 wrote
- I believe not, it's just a concept to me, but not a solving method in general
Viewing a single comment thread. View all comments