This is an image from LAION-5B dataset. As we can see that the image has a human cartoon that is not real human.

But popular face detection algorithms have detected the cartoon as a human face.

Output from insightface.

Insightface output

Output from mediapipe face detection.

MediaPipe output

I came across another failure case-

Image from Laion-5B

Insightface output

Mediapipe output

Comments

You must log in or register to comment.

sqweeeeeeeeeeeeeeeps t1_izo7ejq wrote on December 10, 2022 at 4:17 PM

#900,766

Are you even retraining these models on cartoon faces?

deepneuralnetwork t1_izocssw wrote on December 10, 2022 at 4:55 PM

#901,025

I mean, it looks like a face. It’s not crazy for a CNN to come to the same conclusion.

If you want the model to ignore cartoon faces, you need to train it to do so. Simple as that.

abhijit1247 OP t1_izow7qy wrote on December 10, 2022 at 7:07 PM

#901,924

Replying to sqweeeeeeeeeeeeeeeps (#900,766)

These are pretrained models, which we can access by their respective libraries and if we check at papers with code rankings, they are one of the best for face detection. I had hoped that someone would have used these libraries in their application and would have solved the issue.

abhijit1247 OP t1_izoxgnp wrote on December 10, 2022 at 7:16 PM

#901,984

Replying to deepneuralnetwork (#901,025)

I understand that, but these are state of the art face detection models (as per papers with codes), one would assume that they would have taken care of these kinds of false positives. This has been a common issue across many face detection models and I hoped that someone could suggest a model that has been trained against it. Fine-tuning the detector would be my last resort.

abhijit1247 OP t1_izoyefq wrote on December 10, 2022 at 7:23 PM

#902,022

Replying to abhijit1247 (#901,924)

Retraining the models might be my last resort to solve the issue as I still want the high performance of these models. And retraining them would definitely come at the cost of their performance.

deepneuralnetwork t1_izoylbj wrote on December 10, 2022 at 7:24 PM

#902,029

Replying to abhijit1247 (#901,984)

I wouldn’t assume anything with SOTA models. “State of the art” is far less impressive in the AI world than it might sound.

Final-Rush759 t1_izp316u wrote on December 10, 2022 at 7:55 PM

#902,227

It's not false positive. The models were trained with pictures, not far from cartoons. I think models performed really well.

abhijit1247 OP t1_izp78pp wrote on December 10, 2022 at 8:24 PM

#902,373

Replying to Final-Rush759 (#902,227)

I think so these models were trained with real human faces (eg.- http://shuoyang1213.me/WIDERFACE/) and not cartoon faces. So, the example that I have shown would be a false positive.

CauseSigns t1_izp90t9 wrote on December 10, 2022 at 8:36 PM

#902,430

Replying to abhijit1247 (#902,373)

Unless a model claims to detect only non-cartoon faces, it’s not a false positive. A face is a face

mr_birrd t1_izpew7e wrote on December 10, 2022 at 9:14 PM

#902,626

Replying to abhijit1247 (#902,373)

that's not how it actually learns and it's a good thing, would be complete overfit if it only gives true positives for the exact images it saw in training.

RShuk007 t1_izpxx5b wrote on December 10, 2022 at 11:36 PM

#903,366

InsightFace uses resnet50-100 or ViT-B/L as best performance, that's a deep model that understands a lot of things. It seems because of the lack of synthetic cartoons in the training, the model does not learn whether face is human but instead whether face has human proportion/shape/topography?

You can check this out by implementing

https://arxiv.org/abs/2110.11001

https://arxiv.org/abs/1610.02391

On your models. These papers come under explainable ai, a field that tries to explain where the models look at to make decisions for the final decisions. In this case I can see it looks at the T region and mouth to make decisions, when occluded it only looks at the T region with the eyes, lower than usual resolution of real images does not seem to change the attention of the model. This indicates a lacking of texture and understanding of human face texture and details

I can see this using a custom package I developed for my work, however I can't show the results here due to confidentiality.

RShuk007 t1_izpydka wrote on December 10, 2022 at 11:40 PM

#903,392

Replying to RShuk007 (#903,366)

A simple retraining (fine-tune for fewer epochs only for parameters of the later classifier layers) will probably do the trick, I believe the encoder is still good and you can keep the backbone (resnet50 or ViT) frozen .

sqweeeeeeeeeeeeeeeps t1_izq7367 wrote on December 11, 2022 at 12:49 AM

#903,782

Replying to abhijit1247 (#902,022)

Is this a shit post? these are trained on real human faces. Humans look very different than cartoons.

abhijit1247 OP t1_izqxfer wrote on December 11, 2022 at 4:35 AM

#904,691

Replying to CauseSigns (#902,430)

Due to lack of details of cartoon faces, the model should not detect these cartoon faces as faces with a high confidence score (about 86.8 %). If this is the case then these models are as bad as haar cascade based face detectors.

abhijit1247 OP t1_izr0f8f wrote on December 11, 2022 at 5:01 AM

#904,803

Replying to RShuk007 (#903,392)

This is a great insight. Thanks for the help.

Superschlenz t1_izr0iii wrote on December 11, 2022 at 5:02 AM

#904,807

A foto of a face isn't a face either. That's why Apple's Face ID uses a 3D scanner in addition.

qpwoei_ t1_izrdg66 wrote on December 11, 2022 at 7:08 AM

#905,192

The result you observe is intentional. The training objective of face detection models is usually to detect faces in any kind of pictures: drawings, 3d renders, photos.

RED_MOSAMBI t1_izrkuct wrote on December 11, 2022 at 8:50 AM

#905,415

Because they weren't trained against animated characters, try adding some image processing tools for converting animated to normal characters or normal and animated characters to some common filter

Smallpaul t1_iztcll2 wrote on December 11, 2022 at 6:31 PM

#907,842

It’s weird that you say they are failing. If you asked a human to highlight the face In that picture they would do the exact same thing!

Your application might need something different but don’t call this “failing.” It’s succeeding at what it was designed to do, which is find faces.

What is your application by the way?

abhijit1247 OP t1_izwaek9 wrote on December 12, 2022 at 8:53 AM

#911,522

Replying to Smallpaul (#907,842)

Facial attribute analysis, that is identifying the gender (one of the attributes) of the face detected. That's why face detection, the preliminary step, has to be done in a way that only human faces are detected.

abhijit1247 OP t1_izxymfk wrote on December 12, 2022 at 5:58 PM

#914,092

Hi all I came across a new failure case, I have added it in the original post. There is a chance that the watermark is causing the issue but still, with 71.7% confidence the model has detected it as a face.