Viewing a single comment thread. View all comments

iknowjerome OP t1_ivtti9x wrote

Every dataset has errors and inconsistencies. It is true that some have more than others, but what really matters is how that affects the end goal. Sometimes, the level of inconsistencies doesn't impact model performance as much as one would expect. In other cases, it is the main cause of a poor model performance, at least in one area (for instance, for a specific set of classes). I totally agree with you that companies that succeed in putting and maintaining AI models in production pay particular attention to the quality of the datasets that are created for training and testing purposes.

12

that_username__taken t1_ivttxzf wrote

Yeah I agree, but finding those errors at the end of the cycle is extremely painful and time consuming.

2

iknowjerome OP t1_ivtw0xs wrote

The trick is not to wait for the end of the cycle to make the appropriate adjustments. And there are now a number of solutions on the market that help with understanding and visualizing your image/video data and labels.

5

Mozillah0096 t1_ivtxgd3 wrote

u/iknowjerome can u tell me those solutions which u are talking about

1

jonas__m t1_ix5ey4i wrote

cleanlab is an open-source python library that checks data and label quality

2