Submitted by Devinco001 t3_ywjd26 in MachineLearning
goedel777 t1_iwjxp76 wrote
unique(incorrectly_spelled_words)
Devinco001 OP t1_iwjxuul wrote
Yes, I have done that. It's after dropping the duplicates, the count is coming 10M
goedel777 t1_iwjxz5j wrote
Without seeing the code it will be impossible to help here
Devinco001 OP t1_iwmebpe wrote
Sure, but its just a for loop, looping through the words in the dictionary, and using a python library 'python-levenshtein' to calculate distance between the dictionary words and the mispelled word.
For now, I am skipping levenshtein for a faster approximate distance, using symspell algorithm. It is highly accurate and much faster. Reduced computation time from 21 days to 13 hours
Viewing a single comment thread. View all comments