Viewing a single comment thread. View all comments

just__uncreative t1_it713u8 wrote

Disagree. The above statement from the blog is true.

When you have a large class imbalance skewed negative, the FPR is not very informative because it is not sensitive enough to false positives.

The definition of FPR is FP/(FP+TN). When TN is massive because of class imbalance, your model can be predicting many false positives and the FPR can still be tiny, giving you an overly rosy view of your performance and roc curves/auc that look great, when in reality your model is over predicting the positive class like crazy.

Precision doesn’t have this problem, and so PR is better.

I have worked on real applications where this has come in to play and made a huge difference because in these class imbalanced problems, the positive class is usually what you’re looking for. So if you use roc for model selection you end up flooding your predictions with FPs and it noises up the application significantly.

105

BobDope t1_it9cym1 wrote

You are correct. Agree there are tons of trash blogs but the machine learning mastery dude is legit.

11

hostilereplicator t1_it84dm4 wrote

Not really sure I understand your second paragraph. You can have a high absolute number of false positives with a tiny FPR only if you have a very high volume of negative samples. This isn't an issue with looking at the FPR, it's an issue with not knowing what FPR is acceptable to you for your particular application.

The ROC curve does not assume anything about your positive:negative ratio; the PR curve does, so if the ratio in your test set is different from your ratio in production (and often you don't know what the "true" ratio is in production), your precision measurements will be misleading.

A general difficulty with very low FPR or FNR measurement is lack of samples to measure on e.g. if you have 10_000 negatives and your FPR is 0.1%, you're only estimating your FPR on 10 samples, so the estimate will have high variance - but I think this issue would affect precision and recall measurements at the extremes as well, right?

4

robbsc t1_it7zqsh wrote

One of the main reasons to use a ROC curve is for imbalanced (usually binary) datasets. A more intuitive way to look at FPR is FP/N. The curve tells you the fraction of false positives you are going to pass through for any given TPR (recall, sensitivity). If the fpr you care about is tiny, you can focus on the left side of the curve and ignore the right side.

It's also useful to sample the roc curve at recalls you care about. e.g., how many false positives am i passing through for a TPR of 95%?

Lastly, in my experience, AUC correlates highly with an improved model because most of the right side of the curve doesn't tend to change much and sits close to 1 in situations where you're just trying to improve the left side of the curve. If it doesn't, then you probably just need to change the number of thresholds you're sampling when computing auc.

Whether to use roc or precision-recall depends more on the type of problem you're working on. Obviously precision-recall is better for information retrieval, because you care about what fraction of the information retrieved at a given threshold is useful. Roc is better if you care highly about the raw number of false positives you're letting through.

3

hostilereplicator t1_it85f0v wrote

If you use precision, you also implicitly assume the data you're measuring on has the same positive:negative ratio as data you expect to see in the future (assuming you're going to deploy your model, rather than just doing retrospective analysis). FPR and TPR don't have this issue, so you can construct a test dataset with sufficiently large numbers of bot positives and negatives to get reliable measurements without worrying about the class imbalance.

6

robbsc t1_it87etm wrote

Good point. The only valid criticism of roc curves that i can think of is that you can't always visually compare 2 full ROC curves without "zooming in" to the part you care about.

2

rehrev t1_itihgn8 wrote

I am having trouble understanding this. How is your model overpredicting positive class but your true negative is huge compared to your false positive?

What do you mean by overpredicting positive class if you don't mean high FP compared to TN?

1