RShuk007
RShuk007 t1_izpxx5b wrote
Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
InsightFace uses resnet50-100 or ViT-B/L as best performance, that's a deep model that understands a lot of things. It seems because of the lack of synthetic cartoons in the training, the model does not learn whether face is human but instead whether face has human proportion/shape/topography?
You can check this out by implementing
https://arxiv.org/abs/2110.11001
Or
https://arxiv.org/abs/1610.02391
On your models. These papers come under explainable ai, a field that tries to explain where the models look at to make decisions for the final decisions. In this case I can see it looks at the T region and mouth to make decisions, when occluded it only looks at the T region with the eyes, lower than usual resolution of real images does not seem to change the attention of the model. This indicates a lacking of texture and understanding of human face texture and details
I can see this using a custom package I developed for my work, however I can't show the results here due to confidentiality.
RShuk007 t1_izpydka wrote
Reply to comment by RShuk007 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247
A simple retraining (fine-tune for fewer epochs only for parameters of the later classifier layers) will probably do the trick, I believe the encoder is still good and you can keep the backbone (resnet50 or ViT) frozen .