Submitted by Waterfront_xD t3_ydc9n1 in MachineLearning
I'm training a machine learning model using YOLOv5 from Ultralytics (arch: YOLOv5s6). The task is to detect and identify laundry symbols. For that, I've scraped and labeled 600 images from Google.
Using this dataset, I receive a result with an mAP around 0.6.
But 600 images is a tiny dataset and there are multiple laundry symbols where I have only 1-4 images for training and symbols where I have 100 and more.
So I started writing a Python script which generates more images of laundry symbols. The script basically takes a background image and adds randomly positioned 1-10 laundry symbols in different colors and rotations. No background is used twice. With that script, I generated around 6.000 entirely different images with laundry symbols that every laundry symbol is at least 800 times in the dataset.
Here are examples of the generated data: Link 1 Link 2
I combined the scraped and the generated dataset and retrained the model with the same configuration. The result is really bad: the mAP dropped to 0.15 and the model overfits. The confusion matrix told me why: Confusion matrix
Why is the model learning the background instead the objects?
First I thought my annotation might be wrong, but the training script from Ultralytics saves a few examples of training batch images - there the boxes are drawn perfectly around the generated symbols.
For completeness, below are more analytics added about the training:
emotional_nerd_ t1_itrn8sd wrote
Hmm. I am considering whether this might be due to the added variety of backgrounds, causing the model to be easily confused with the details.