Submitted by jacobgil t3_11orezx in MachineLearning
https://github.com/jacobgil/confidenceinterval
pip install confidenceinterval
tldr: You don't have an excuse anymore to not use confidence intervals !
​
In statistics, confidence intervals are commonly reported along accuracy metrics to help interpret them.
For example, an AUC metric might be 0.9 but if the 95% confidence interval is in the range [0.7, 0.96], we can't confidently say we didn't just get lucky - we should be really careful making decisions around that result.
More formally, a confidence interval gives us a range on where the true unknown accuracy metric could be, and a 95% confidence interval means that if we would repeat the experiment many times, 95% of the confidence-intervals we reported would have the actual true metric (which is unknown) inside them - coverage.
Confidence intervals are usually computed analytically, by making some assumptions about the metric distribution and using the central limit theorem,or by using bootstrapping - resampling the results again and again, computing the metric, and checking the resulting distribution.
However, in the python data science world, I rarely saw these being used. I guess part of the reason is the culture, where many data science practitioners don't come from the statistics world. But I think the main reason is that there aren't easy to use libraries that do this. While in the R language there is fantastic support for confidence intervals, for python there are mostly scattered pieces of code and blog posts.
​
The confidenceinterval package keeps the clean and popular scikit-learn metric API,
e.g roc_auc_score(y_true, y_pred), but also returns confidence intervals.
It supports analytical computations for many methods (including AUC with the delong method, or F1 with macro, micro averaging, following the recent results from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8936911/#APP2, or binary proportions like the TPR using binomial CI methods like the wilson interval).
It can be easily switched to using bootstrapping (with several supported bootstrapping methods),
and also gives you a way to easily compute the confidence interval for any metric with bootstrapping.
Valuable-Kick7312 t1_jbuwppx wrote
Cool! This always assume that the data is drawn iid?