ShadowPirate42 t1_j06glhn wrote
I assume you mean normalizing on axis 1. In most cases this is a bad idea. think about a house price predictor. You have a sq footage and number of bathrooms. If you normalize on axis 1, the number of bathrooms will be 0.0003 and the sq footage might be 0.6. You still dealing with different scales and you might as well not normalize at all. You would be better off capping the upper and lower end after normalization, but still normalizing on the axis 0. E.g. convert any value above 1 to 1 and below 0 to 0.
Edit: alternatively if your data has a lot of outliers, you may want to clip prior to normalization:
pd.DataFrame = xtrain.apply(lambda col: col.clip(*col.quantile([min_clip, max_clip]).values))
or just use standardization:
https://dataakkadian.medium.com/standardization-vs-normalization-da7a3a308c64#:~:text=In%20statistics%2C%20Standardization%20is%20the,range%20between%200%20and%201.
Viewing a single comment thread. View all comments