biophysninja
biophysninja t1_ixfx9yv wrote
Andrej Karpathy has been creating amazing videos on his channel implementing language models from scratch. If you can create videos at the level of accessibility while presenting fundamental concepts, you will make a difference.
biophysninja t1_j0a4nc2 wrote
Reply to [D] Dealing with extremely imbalanced dataset by hopedallas
There are a few ways to approach this depending on the nature of the data, complexity, and compute available.
1- using SMOTE https://towardsdatascience.com/stop-using-smote-to-handle-all-your-imbalanced-data-34403399d3be
2- if your data is sparse you can use PCA or Autoencoders to reduce the dimensionality. Then follow up with SMOTE.
3- Using GANs to generate negatives samples is another alternative.