Submitted by Waterfront_xD t3_ydc9n1 in MachineLearning
I'm training a machine learning model using YOLOv5 from Ultralytics (arch: YOLOv5s6). The task is to detect and identify laundry symbols. For that, I've scraped and labeled 600 images from Google.
Using this dataset, I receive a result with an mAP around 0.6.
But 600 images is a tiny dataset and there are multiple laundry symbols where I have only 1-4 images for training and symbols where I have 100 and more.
So I started writing a Python script which generates more images of laundry symbols. The script basically takes a background image and adds randomly positioned 1-10 laundry symbols in different colors and rotations. No background is used twice. With that script, I generated around 6.000 entirely different images with laundry symbols that every laundry symbol is at least 800 times in the dataset.
Here are examples of the generated data: Link 1 Link 2
I combined the scraped and the generated dataset and retrained the model with the same configuration. The result is really bad: the mAP dropped to 0.15 and the model overfits. The confusion matrix told me why: Confusion matrix
Why is the model learning the background instead the objects?
First I thought my annotation might be wrong, but the training script from Ultralytics saves a few examples of training batch images - there the boxes are drawn perfectly around the generated symbols.
For completeness, below are more analytics added about the training:
mearco t1_itrsfny wrote
In my opinion. The symbols stand out easily on the backgrounds you are using. The synthetic images you make are too different from the images you really want to perform well on.
I would work on trying to collect more data or do more classic data augmentation.
It would be quite difficult to generate more realistic synthetic examples. One issue you have is the square color background around the object. You should try use a background remover tool so that you just have the black lines of the symbol.