RShuk007 t1_izpydka wrote on December 10, 2022 at 11:40 PM

Reply to comment by RShuk007 in Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

A simple retraining (fine-tune for fewer epochs only for parameters of the later classifier layers) will probably do the trick, I believe the encoder is still good and you can keep the backbone (resnet50 or ViT) frozen .

RShuk007 t1_izpxx5b wrote on December 10, 2022 at 11:36 PM

Reply to Why popular face detection models are failing against cartoons and is there any way to prevent these false positives? by abhijit1247

InsightFace uses resnet50-100 or ViT-B/L as best performance, that's a deep model that understands a lot of things. It seems because of the lack of synthetic cartoons in the training, the model does not learn whether face is human but instead whether face has human proportion/shape/topography?

You can check this out by implementing

https://arxiv.org/abs/2110.11001

https://arxiv.org/abs/1610.02391

On your models. These papers come under explainable ai, a field that tries to explain where the models look at to make decisions for the final decisions. In this case I can see it looks at the T region and mouth to make decisions, when occluded it only looks at the T region with the eyes, lower than usual resolution of real images does not seem to change the attention of the model. This indicates a lacking of texture and understanding of human face texture and details

I can see this using a custom package I developed for my work, however I can't show the results here due to confidentiality.