Basically, the raw data can be biased. If you just take all your company's hiring data and feed it into a model, the model will learn to replicate any discriminatory practices that historically existed at your company. (And there are plenty of studies that suggest such bias exists even among well-meaning hiring managers who attempt to be race/gender neutral.)
Suppose you have a raw dataset where 20% of white applicants are hired and only 10% of applicants of color are hired. Even if you exclude the applicants' race from the features used by the model, you will likely end up with a system that is half as likely to hire applicants' of color compared to white applicants. AI is extremely good at extracting patterns from disparate data points, so it will find other, subtler indicators of race and learn to penalize them. Maybe it decides that degrees from historically black universities are less valuable than degrees from predominantly white liberal arts schools. Maybe it decides that guys named DeSean are less qualified than guys named Sean. You get the picture.
Correcting these biases in the raw data isn't quite the same as filling quotas. The idea is that two equally qualified applicants have the same likelihood of getting hired. You could have a perfectly unbiased model and still fail to meet a quota because no people of color apply in the first place.
sprinkles120 t1_j1rqgga wrote
Reply to comment by JMAN1422 in NYC's AI bias law is delayed until April 2023, but when it comes into effect, NYC will be the first jurisdiction mandating an AI bias order in the world, revolutionizing the use of AI tools in recruiting by Background-Net-4715
Basically, the raw data can be biased. If you just take all your company's hiring data and feed it into a model, the model will learn to replicate any discriminatory practices that historically existed at your company. (And there are plenty of studies that suggest such bias exists even among well-meaning hiring managers who attempt to be race/gender neutral.) Suppose you have a raw dataset where 20% of white applicants are hired and only 10% of applicants of color are hired. Even if you exclude the applicants' race from the features used by the model, you will likely end up with a system that is half as likely to hire applicants' of color compared to white applicants. AI is extremely good at extracting patterns from disparate data points, so it will find other, subtler indicators of race and learn to penalize them. Maybe it decides that degrees from historically black universities are less valuable than degrees from predominantly white liberal arts schools. Maybe it decides that guys named DeSean are less qualified than guys named Sean. You get the picture. Correcting these biases in the raw data isn't quite the same as filling quotas. The idea is that two equally qualified applicants have the same likelihood of getting hired. You could have a perfectly unbiased model and still fail to meet a quota because no people of color apply in the first place.