Submitted by abhijit1247 t3_zhvdld in deeplearning

This is an image from LAION-5B dataset. As we can see that the image has a human cartoon that is not real human.

Image from LAION-5B

But popular face detection algorithms have detected the cartoon as a human face.

Output from insightface.

Insightface output

Output from mediapipe face detection.

​

MediaPipe output

I came across another failure case-

Image from Laion-5B

​

Insightface output

​

Mediapipe output

6

Comments

You must log in or register to comment.

Final-Rush759 t1_izp316u wrote

It's not false positive. The models were trained with pictures, not far from cartoons. I think models performed really well.

10

abhijit1247 OP t1_izp78pp wrote

I think so these models were trained with real human faces (eg.- http://shuoyang1213.me/WIDERFACE/) and not cartoon faces. So, the example that I have shown would be a false positive.

−5

CauseSigns t1_izp90t9 wrote

Unless a model claims to detect only non-cartoon faces, it’s not a false positive. A face is a face

7

abhijit1247 OP t1_izqxfer wrote

Due to lack of details of cartoon faces, the model should not detect these cartoon faces as faces with a high confidence score (about 86.8 %). If this is the case then these models are as bad as haar cascade based face detectors.

0

mr_birrd t1_izpew7e wrote

that's not how it actually learns and it's a good thing, would be complete overfit if it only gives true positives for the exact images it saw in training.

2

deepneuralnetwork t1_izocssw wrote

I mean, it looks like a face. It’s not crazy for a CNN to come to the same conclusion.

If you want the model to ignore cartoon faces, you need to train it to do so. Simple as that.

7

abhijit1247 OP t1_izoxgnp wrote

I understand that, but these are state of the art face detection models (as per papers with codes), one would assume that they would have taken care of these kinds of false positives. This has been a common issue across many face detection models and I hoped that someone could suggest a model that has been trained against it. Fine-tuning the detector would be my last resort.

−8

deepneuralnetwork t1_izoylbj wrote

I wouldn’t assume anything with SOTA models. “State of the art” is far less impressive in the AI world than it might sound.

5

sqweeeeeeeeeeeeeeeps t1_izo7ejq wrote

Are you even retraining these models on cartoon faces?

5

abhijit1247 OP t1_izow7qy wrote

These are pretrained models, which we can access by their respective libraries and if we check at papers with code rankings, they are one of the best for face detection. I had hoped that someone would have used these libraries in their application and would have solved the issue.

−6

abhijit1247 OP t1_izoyefq wrote

Retraining the models might be my last resort to solve the issue as I still want the high performance of these models. And retraining them would definitely come at the cost of their performance.

−5

sqweeeeeeeeeeeeeeeps t1_izq7367 wrote

Is this a shit post? these are trained on real human faces. Humans look very different than cartoons.

2

RShuk007 t1_izpxx5b wrote

InsightFace uses resnet50-100 or ViT-B/L as best performance, that's a deep model that understands a lot of things. It seems because of the lack of synthetic cartoons in the training, the model does not learn whether face is human but instead whether face has human proportion/shape/topography?

You can check this out by implementing

https://arxiv.org/abs/2110.11001

Or

https://arxiv.org/abs/1610.02391

On your models. These papers come under explainable ai, a field that tries to explain where the models look at to make decisions for the final decisions. In this case I can see it looks at the T region and mouth to make decisions, when occluded it only looks at the T region with the eyes, lower than usual resolution of real images does not seem to change the attention of the model. This indicates a lacking of texture and understanding of human face texture and details

I can see this using a custom package I developed for my work, however I can't show the results here due to confidentiality.

1

RShuk007 t1_izpydka wrote

A simple retraining (fine-tune for fewer epochs only for parameters of the later classifier layers) will probably do the trick, I believe the encoder is still good and you can keep the backbone (resnet50 or ViT) frozen .

1

abhijit1247 OP t1_izr0f8f wrote

This is a great insight. Thanks for the help.

0

Superschlenz t1_izr0iii wrote

A foto of a face isn't a face either. That's why Apple's Face ID uses a 3D scanner in addition.

1

qpwoei_ t1_izrdg66 wrote

The result you observe is intentional. The training objective of face detection models is usually to detect faces in any kind of pictures: drawings, 3d renders, photos.

1

RED_MOSAMBI t1_izrkuct wrote

Because they weren't trained against animated characters, try adding some image processing tools for converting animated to normal characters or normal and animated characters to some common filter

1

Smallpaul t1_iztcll2 wrote

It’s weird that you say they are failing. If you asked a human to highlight the face In that picture they would do the exact same thing!

Your application might need something different but don’t call this “failing.” It’s succeeding at what it was designed to do, which is find faces.

What is your application by the way?

1

abhijit1247 OP t1_izwaek9 wrote

Facial attribute analysis, that is identifying the gender (one of the attributes) of the face detected. That's why face detection, the preliminary step, has to be done in a way that only human faces are detected.

1

abhijit1247 OP t1_izxymfk wrote

Hi all I came across a new failure case, I have added it in the original post. There is a chance that the watermark is causing the issue but still, with 71.7% confidence the model has detected it as a face.

1