Viewing a single comment thread. View all comments

ash-050 t1_itkm15f wrote

Hello, I am new to ML and have been recently practicing with Scikit-learn mainly. I have a case where i have a list of independent variables and a profit dependent variable. My question is what is the approach to know how a model can help me define which independent variables i can change to reflect a certain increase on the profit variable given the history of data? Some directions on that would be very helpful.

1

YamEnvironmental4720 t1_itpvllg wrote

You may want to take a look at the Random Forest algorithm, for instance one of the introductory lectures by Nando de Freitas on YouTube on this topic. The key word is entropy, and the idea is to study how this changes when you look at all sample points with some variable value below and above some threshold value, respectively. You do this for all the variables and for each variable you also test different threshold values.

1

ash-050 t1_ittne1q wrote

Thank you so much u/YamEnvironmental4720 for your reply. Would I be having the same results if I used the trained model's feature importance ?

1

YamEnvironmental4720 t1_ituam06 wrote

It depends on how you define importance. Entropy could be one such definition but even in forest classifiers there are alternatives to entropy.

1

ash-050 t1_iu3awlr wrote

Thank you so much. My case the alternatives are on regression

1

YamEnvironmental4720 t1_iu3frfr wrote

Ok, in that case there is the cost function, defined on the model's parameters, that measures the average distance from the sample points to your hypothesis. This is the average error the model has for the fixed parameters. In the case of linear regression, the importance of a certain variable is given by the weight parameter attached to that variable.

If you are familiar with multidimensional calculus, the dependence of a fixed such parameter is given by the partial derivative of the cost function in this direction.

This is quite well explained in Andrew Ng's video lecture on linear regression: https://www.youtube.com/watch?v=pkJjoro-b5c&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&index=19.

1