Submitted by iknowjerome t3_yrfzcf in MachineLearning
iknowjerome OP t1_ivtti9x wrote
Reply to comment by that_username__taken in [R] A relabelling of the COCO 2017 dataset by iknowjerome
Every dataset has errors and inconsistencies. It is true that some have more than others, but what really matters is how that affects the end goal. Sometimes, the level of inconsistencies doesn't impact model performance as much as one would expect. In other cases, it is the main cause of a poor model performance, at least in one area (for instance, for a specific set of classes). I totally agree with you that companies that succeed in putting and maintaining AI models in production pay particular attention to the quality of the datasets that are created for training and testing purposes.
that_username__taken t1_ivttxzf wrote
Yeah I agree, but finding those errors at the end of the cycle is extremely painful and time consuming.
iknowjerome OP t1_ivtw0xs wrote
The trick is not to wait for the end of the cycle to make the appropriate adjustments. And there are now a number of solutions on the market that help with understanding and visualizing your image/video data and labels.
Mozillah0096 t1_ivtxgd3 wrote
u/iknowjerome can u tell me those solutions which u are talking about
Viewing a single comment thread. View all comments