Submitted by AutoModerator t3_ybjvk5 in MachineLearning
YamEnvironmental4720 t1_itpvllg wrote
Reply to comment by ash-050 in [D] Simple Questions Thread by AutoModerator
You may want to take a look at the Random Forest algorithm, for instance one of the introductory lectures by Nando de Freitas on YouTube on this topic. The key word is entropy, and the idea is to study how this changes when you look at all sample points with some variable value below and above some threshold value, respectively. You do this for all the variables and for each variable you also test different threshold values.
ash-050 t1_ittne1q wrote
Thank you so much u/YamEnvironmental4720 for your reply. Would I be having the same results if I used the trained model's feature importance ?
YamEnvironmental4720 t1_ituam06 wrote
It depends on how you define importance. Entropy could be one such definition but even in forest classifiers there are alternatives to entropy.
ash-050 t1_iu3awlr wrote
Thank you so much. My case the alternatives are on regression
YamEnvironmental4720 t1_iu3frfr wrote
Ok, in that case there is the cost function, defined on the model's parameters, that measures the average distance from the sample points to your hypothesis. This is the average error the model has for the fixed parameters. In the case of linear regression, the importance of a certain variable is given by the weight parameter attached to that variable.
If you are familiar with multidimensional calculus, the dependence of a fixed such parameter is given by the partial derivative of the cost function in this direction.
This is quite well explained in Andrew Ng's video lecture on linear regression: https://www.youtube.com/watch?v=pkJjoro-b5c&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=19.
Viewing a single comment thread. View all comments