likeamanyfacedgod OP t1_it6gqnz wrote on October 21, 2022 at 7:52 AM

Reply to comment by KingsmanVince in [D] Accurate blogs on machine learning? by likeamanyfacedgod

It's not trivial to me at all. I've seen a few blog posts that make this statement, but from my own experience, it's not true, you can even test it yourself by balancing and unbalancing your model. Look at how it is calculated, the TPR and FPR are both fractions, so it won't matter if one is a larger class than the other. What does matter though is if you care more about predicting one class than the other.

PassionatePossum t1_it6ms0v wrote on October 21, 2022 at 9:20 AM

>the TPR and FPR are both fractions, so it won't matter if one is a larger class than the other.

In most cases that is a desirable property. You don't want to have excellent results just because one class makes up 99% of your dataset and the classifier just predicts the most common class without learning anything. Precision and Recall are also fractions.

The difference between ROC and Precision/Recall is that ROC needs the concept of a "negative class". That can be problematic for multi-class problems. Even if your data is perfectly balanced across all of your classes, the negative class (i.e. all classes that aren't the class you are examining) is bound to be overrepresented.

Since you only need the positive examples for a precision/recall plot you don't have that problem.

So, I don't have a problem with the statement that ROC is appropiate for a balanced dataset (provided that we have a binary classification problem or the number of different classes is at least low).

madrury83 t1_it7hw31 wrote on October 21, 2022 at 2:24 PM

I think the more rigorous way to get at the OPs point is to observe that the AUC is the probability that a randomly selected positive class is scored higher (by your fixed model) than a randomly chosen negative class. Being probabilities, these are independent (at a population level) of the number of samples you have from your positive and negative populations (of course, smaller samples get you more sampling variance). I believe this is the OPs point with "they are fractions".

In any case, can we at least all agree that blogs/articles throwing around this kind of advice without justification is less than helpful?

rehrev t1_it6lcll wrote on October 21, 2022 at 8:58 AM

So you just don't think it's true and don't have an actual reason or explanation?

likeamanyfacedgod OP t1_it6ln21 wrote on October 21, 2022 at 9:03 AM

you can even test it yourself by balancing and unbalancing your model.
Look at how it is calculated, the TPR and FPR are both fractions, so it
won't matter if one is a larger class than the other. What does matter
though is if you care more about predicting one class than the other.

rehrev t1_it6m93w wrote on October 21, 2022 at 9:12 AM

Actual

likeamanyfacedgod OP t1_ithrjsb wrote on October 23, 2022 at 7:12 PM

I gave you one, do you have an "actual" reason to back up why it is true or do you just troll posts without having anything intelligent to contribute?