Submitted by 8hubham t3_znkeh2 in MachineLearning
NinoIvanov t1_j0korab wrote
Classically, you would use some form of "template", in the simplest form a sort of "anchor word" whereabout in a certain radius other (pre-defined) words are sought. If a "match" is found — a goal is recognized. The difficulty, evidently, is how to get down false positives & false negatives and how to "estimate" good templates — the advantage is, however, full explainability: "WHY was that goal suggested" would be exactly traceable. The templates can get arbitrarily involved, e.g. with probabilities, conditional probabilities, dependencies of words and goals, etc.
"With machine learning" you could give it a set of "labelled texts", as in, "this text is about this, that text is about that", and you could have the system reduce the matching words (in the simplest form: simply as a set of words in no particular order) progressively until the ability to recognize a goal given a small "bag of words" has been optimized. You can e.g. use for that random forests, or whatever else you like. Disadvantage: EXPLAINING the goals will be way harder. — EDIT: for this approach, you do need an annotated data set, for the above one — not, there instead you need the templates'
8hubham OP t1_j0lpb55 wrote
Thank you for the suggestions.
I would like to learn more about the first approach. Can you share any links/articles explaining the first approach.
NinoIvanov t1_j0nl601 wrote
A brief intro using regular expressions, giving you the general idea:
https://www.nzini.com/lessons/NLP2+-+Template+Matching.html
Also, classically, look for the "Message Understanding Conferences" and "Information Extraction" & "Named Entity Recognition" as a task.
It gets really tricky if the information is "implied": John bought flowers for Lucy —> "Does John like Lucy?": evidently yes, but nobody SAYS that. Good luck! 😊
Viewing a single comment thread. View all comments