austacious

austacious t1_je0g6oi wrote

A healthy skepticism in AIML from those in the field is incredibly important and relatively hard to come by. Having the attitude that 'This is great and everything is wonderful' does not lead to meaningful progress addressing very real issues. It's very productive to point out shortcomings of otherwise highly effective models.

12

austacious t1_ircb1ny wrote

The issue with this is that removing bias based on demographics necessitates harming other demographics. Say you have a hospital whos patient demographics are 80% over the age of 65, 20% under the age of 65 (Substitute in whatever more controversial group identites you'd like). Any model will be biased and overperform on the group over 65 comparatively, there is just more data to learn from for that demographic. If you oversample data from the younger population to try to equalize outcomes between demographics, then you're training distribution will no longer be identically distributed with your testing distribution. While the model performance will improve for patients in the less represented demographics, overall performance will necessarilly decrease. Overall more people will be harmed because of the decreased efficacy of the model, but the members of one demographic in particular will not be disproportionately harmed.

It's a question of ethics. The utilitarian would say to keep train/test distributions i.i.d. no matter what, blind to demographics. At the same time, nobody should receive subpar care due to their race, age, whatever group you associate with.

2

austacious t1_irc86vu wrote

I'm not sure if you're aware, but this has actually happened

The TLDR is it drew major backlash from news orgs and the ACLU. Argument being that since number of arrests were included in the feature set, the model would reflect existing police biases. Kinda hard to build a crime prediction model without historical arrest data, though.

3