bankCC

bankCC t1_ix9o1ln wrote

Which approach would be best for a classification of text into 2 categories, where my dataset is realy small and unbalanced (4000, 250) each text containing around 200-300 words.

And most of the time just one or two words will lead to classification. I could just do a keyword search, but misspelled words might slip through and the dictionary would be pretty big and computational expensive to compare on each file. So I thought ML would be a better idea.

Maybe a CNN but the dataset seems to be way too small to accomplish acceptable results.

Any hints are welcome tyvm

1