Submitted by groman434 t3_103694n in MachineLearning
e_for_oil-er t1_j2xz49i wrote
Reply to comment by groman434 in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434
I guess "errors" in the dataset could be equivalent to introducing noise (like random perturbations with mean 0) or a bias (perturbation with non 0 expectation). I guess those would be the two main kind of innacuracies found in data.
Bias has been the plague of some language models which were based on internet forum data. The training data was biased towards certain opinions, and the model just spat them out. This is has caused the creators of those models to shut them down. I don't know how could one do to correct bias, since this is not at all my expertise.
Learning techniques resistant to noise (often called robust) are an active field of research, and some methods actually perform really well.
Viewing a single comment thread. View all comments