bankCC t1_ixc6lk0 wrote on November 22, 2022 at 10:03 AM

Reply to comment by Gazorpazzor in [D] Simple Questions Thread by AutoModerator

Thank you very much for the answer! I highly appreciate it. You gave me a realy good base to start from. Huge thanks

bankCC t1_ix9o1ln wrote on November 21, 2022 at 8:28 PM

Reply to [D] Simple Questions Thread by AutoModerator

Which approach would be best for a classification of text into 2 categories, where my dataset is realy small and unbalanced (4000, 250) each text containing around 200-300 words.

And most of the time just one or two words will lead to classification. I could just do a keyword search, but misspelled words might slip through and the dictionary would be pretty big and computational expensive to compare on each file. So I thought ML would be a better idea.

Maybe a CNN but the dataset seems to be way too small to accomplish acceptable results.

Any hints are welcome tyvm