w_is_h OP t1_j0yr1yu wrote
Reply to comment by EmmyNoetherRing in [R] Foresight: Deep Generative Modelling of Patient Timelines using Electronic Health Records by w_is_h
Hi, we did not do that but will mark it down for the next iteration. During the manual tests, we did not see any obvious biases or problems given ethnicity/sex but probably good to make a quantitative analysis.
EmmyNoetherRing t1_j0yrv8i wrote
Don’t forget to check for accuracy by illness category too. Humans have biases because of social issues, machines also pick up biases due to the relative shapes/distributions of the various concepts they’re trying to learn— they’ll do better on simpler ones and more common ones. You might get high accuracy on cold/flu cases that show up frequently in the corpus and have very simple treatment paths, and because they show up frequently that may bump up your overall accuracy. But at the same time you want to check how it’s handling less common cases whose diagnosis/treatment will likely be spread across multiple records over a period of time, like cancer or auto-immune issues.
It’s a good idea to verify that your simulation process isn’t accidentally stripping the diversity out of the original data, by generating instances of the rarer or more complex cases that are biased towards having traits from the simpler and more common cases (especially in this context that might result in some nonsensical record paths for more complex illnesses).
w_is_h OP t1_j0z6n68 wrote
We did not explore intrinsic biases in the data, like doctors prescribing a certain medication or giving a certain diagnosis because of someone's social status, or because something is more common or anything else. This for sure happens, there are many papers talking about these problems in healthcare, and we in fact think that the model (foresight) can be used to explore biases in the data. In the future, we hope to resolve this by training the models also on medical guidelines and biomedical literature - so not just hospital text.
We did analyse the predictions for problems like the model always predicting the most common concepts or the simplest concepts. I will add a histogram of the F1 scores over different concepts to the paper, but it does show that the model predicts a very wide range of different concepts accurately. We've also done a manual analysis, where 5 clinicians checked the model predictions - and in fact, the model is better at predicting complex and strange cases. But, this is expected because forecasting someone's future and saying they will have the flu in 3 months is nearly impossible.
EmmyNoetherRing t1_j0zl99m wrote
Nice!
Viewing a single comment thread. View all comments