Submitted by jacobgil t3_11orezx in MachineLearning

pip install confidenceinterval

tldr: You don't have an excuse anymore to not use confidence intervals !


In statistics, confidence intervals are commonly reported along accuracy metrics to help interpret them.

For example, an AUC metric might be 0.9 but if the 95% confidence interval is in the range [0.7, 0.96], we can't confidently say we didn't just get lucky - we should be really careful making decisions around that result.

More formally, a confidence interval gives us a range on where the true unknown accuracy metric could be, and a 95% confidence interval means that if we would repeat the experiment many times, 95% of the confidence-intervals we reported would have the actual true metric (which is unknown) inside them - coverage.

Confidence intervals are usually computed analytically, by making some assumptions about the metric distribution and using the central limit theorem,or by using bootstrapping - resampling the results again and again, computing the metric, and checking the resulting distribution.

However, in the python data science world, I rarely saw these being used. I guess part of the reason is the culture, where many data science practitioners don't come from the statistics world. But I think the main reason is that there aren't easy to use libraries that do this. While in the R language there is fantastic support for confidence intervals, for python there are mostly scattered pieces of code and blog posts.


The confidenceinterval package keeps the clean and popular scikit-learn metric API,

e.g roc_auc_score(y_true, y_pred), but also returns confidence intervals.

It supports analytical computations for many methods (including AUC with the delong method, or F1 with macro, micro averaging, following the recent results from, or binary proportions like the TPR using binomial CI methods like the wilson interval).

It can be easily switched to using bootstrapping (with several supported bootstrapping methods),

and also gives you a way to easily compute the confidence interval for any metric with bootstrapping.



You must log in or register to comment.

Valuable-Kick7312 t1_jbuwppx wrote

Cool! This always assume that the data is drawn iid?


jacobgil OP t1_jc2h94t wrote

Yes. I think confidence intervals assume iid. If they are not iid, then the CI could be too short.


Valuable-Kick7312 t1_jc2ziwy wrote

Thank you for the answer!

Just a few notes: In general, confidence intervals do not assume iid. Moreover, in theory, if the data is not drawn iid then CI can also be smaller. However, I have not encountered this in practice yet.


jonnyyen t1_jbvhdvn wrote

Nice to see a python implementation of deLong's method - I've had to use pROC (in R) for that in the past. For binary event analysis (among other things) there's also, which also has bootstrapped confidence intervals, or analytic CI using Wald or Agresti-Coull. The terminology is from weather literature, but it covers a lot of the same ground.


francozzz t1_jbvhf9n wrote

I’ve just been asked to use confidence intervals for a project I’m working at, this comes as a godsend! Thanks!


Balance- t1_jc16bi6 wrote

Looks awesome!

I would also post at r/Python and/or r/DataScience


jacobgil OP t1_jc2heig wrote

Thanks! Following your suggestion I posted to r/DataScience