Submitted by Lugi t3_xt01bk in MachineLearning
chatterbox272 t1_iqp67eq wrote
Reply to comment by Lugi in [D] Focal loss - why it scales down the loss of minority class? by Lugi
It is most likely because the focal term ends up over-emphasizing the rare class term for their task. The focal loss up-weights hard samples (most of which will usually be the rare/object class) and down-weights easy samples (background/common class). The alpha term is therefore being set to re-adjust the background class back up, so it doesn't become too easy to ignore. They inherit the nomenclature from cross entropy, but they use the term in a different way and are clear as mud about it in the paper.
I_draw_boxes t1_iqvuh8g wrote
>The alpha term is therefore being set to re-adjust the background class back up, so it doesn't become too easy to ignore.
This is it. The background in RetinaNet far exceeds foreground so the default prediction of the network will be background which generates very little loss per anchor in their formulation. Focal loss without alpha is symmetrical, but the targets and behavior of RetinaNet is not.
Alpha might be intended to bring up the loss for common negative examples to keep it in balance with foreground loss. It might also be intended to bring up the loss for false positives which are even more rare than foreground.
Viewing a single comment thread. View all comments