Empty-Painter-3868 t1_irtdww2 wrote on October 10, 2022 at 10:04 PM

Great question. In practice, I spend a week crafting a 'good' weak dataset. The result is a modest performance gain, and the model becomes a lot more unpredictable (spans off by a token or so).

The correct answer nobody wants to hear is: "I should have spent a week labelling data"

Forget Snorkel and all that crap. It's harder to make good labelling functions than it is to label data, IMO

Seankala t1_iru9kdm wrote on October 11, 2022 at 2:09 AM

I second forgetting about Snorkel and the like. I found it better for me to just label the datapoints myself and continuously refine pseudo labels generated by models.

yldedly t1_irvmn2v wrote on October 11, 2022 at 11:46 AM

>The correct answer nobody wants to hear is: "I should have spent a week labelling data"

... with active learning?

ratatouille_artist OP t1_irtgyba wrote on October 10, 2022 at 10:27 PM

I think the devil is in the details. You can use weak supervision to sample from a particular distribution and make your labelling more efficient.

It also works really well in pharma where you can build and apply ontologies for your weak supervision. In this case annotation would still be hard and required but your annotations would also be structured and adapted for later use in the ontology at the cost of slower annotation.