Submitted by ratatouille_artist t3_y0qra7 in MachineLearning
I had the pleasure of running a workshop on weak supervision for NLP recently. I would like to hear more about what are your experiences with using weak supervision for NLP?
I am a huge of weak supervision personally, I think skweak
is a great tool for span based weak supervision.
With simple and efficient out-of-the-box machine learning APIs finetuning and deploying machine learning models has never been easier. The lack of labelled data is a real bottleneck for most projects. Weak supervision can help:
- labelling data more efficiently
- generating noisy labelled data to finetune your model on
Here's an example skweak
labelling function to generate noisy labelled data:
from skweak.base import SpanAggregator
class MoneyDetector(SpanAggregator):
def __init__(self):
super(MoneyDetector, self).__init__("money_detector")
def find_spans(self, doc):
for tok in doc[1:]:
if tok.text[0].isdigit() and tok.nbor(-1).is_currency:
yield tok.i-1, tok.i+1, "MONEY"
money_detector = MoneyDetector()
This labelling function extracts any digits that are preceded by a currency.
Example of labelling function in action
​
skweak
allows you to combine multiple labelling functions using spacy
attributes or other methods.
Using labelling functions has a number of advantages:
- 💪 larger coverage, a single labelling function can cover many samples
- 🤓 involving experts, domain expert annotation is expensive, domain expert labelling functions are more economical due to coverage
- 🌬️ adopting to changing domains, labelling functions and data assets can be adapted to changing domains
What are your experiences with weak supervision in NLP? I really recommend trying out skweak
in particular if you work with span extraction.
Ulfgardleo t1_irtdtoj wrote
This feels and sounds like an add. But i could not find out for what. maybe you should make it clear which product i should definitely use.