Submitted by [deleted] t3_113o5up in deeplearning
Hi,
I have been trying to draw a bounding box around objects using a ML/NN approach.
The project uses Transfer Learning. This is a pretrained VGG16 with a Regression Head. I selected this one because it seems a good architecture and was easy to implemented it in Javascript, where you wont find it.
The first 16 layers use pretrained weights taken from the Keras creator GH page.
I have trained the out putlayers it with Caltech101 datasets airplanes (800), faces (400), stop signs (60) etc.
It predicts reasonably well with images of the same dataset not seen before by the model.
Yet for any new image (any picture with a face that I have in the laptop) the predictions are terrible.
After running out of ideas I am reaching out for some help. I have tried:
- changed number of layers,
- changed number of units,
- train some VGG16 inner layers
Solution
I did it with tiny yolo v7, using google colab. Spent a day finding a project that is usable.
THIS ONE IS FINE > https://github.com/WongKinYiu/yolov
The results are indeed great. Most of the time I find Roboflow extremely handy, I used it to merge datasets, augmentate, read tutorials and that kind of thing.
Thanks for your support ! Specially to u/PaleontologistDue620
Why didnt VGG work for this task?
- Pytorch has several usable models one of which is SSD-VGG which also explains what my idea was lacking of (a strategy to combine with the CNN)
- Why doesnt pytorch have yolo! https://github.com/pytorch/vision/issues/6341
More options
- Apart from VGG-SSD (Source code https://pytorch.org/vision/main/_modules/torchvision/models/detection/ssd.html), there are other models available there. And also the one I tried my self (yolov7 https://github.com/WongKinYiu/yolov)
- Nice implemention of Yolo that is BSD license (not GPL) https://github.com/Megvii-BaseDetection/YOLOX
[deleted] OP t1_j8rdy53 wrote
One possible reason https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e i.e the VGG convolutional model wont be good for bounding boxes but only for classification task.