Submitted by ratatouille_artist t3_y0qra7 in MachineLearning
Empty-Painter-3868 t1_irtdww2 wrote
Great question. In practice, I spend a week crafting a 'good' weak dataset. The result is a modest performance gain, and the model becomes a lot more unpredictable (spans off by a token or so).
The correct answer nobody wants to hear is: "I should have spent a week labelling data"
Forget Snorkel and all that crap. It's harder to make good labelling functions than it is to label data, IMO
Seankala t1_iru9kdm wrote
I second forgetting about Snorkel and the like. I found it better for me to just label the datapoints myself and continuously refine pseudo labels generated by models.
yldedly t1_irvmn2v wrote
>The correct answer nobody wants to hear is: "I should have spent a week labelling data"
... with active learning?
ratatouille_artist OP t1_irtgyba wrote
I think the devil is in the details. You can use weak supervision to sample from a particular distribution and make your labelling more efficient.
It also works really well in pharma where you can build and apply ontologies for your weak supervision. In this case annotation would still be hard and required but your annotations would also be structured and adapted for later use in the ontology at the cost of slower annotation.
Viewing a single comment thread. View all comments