Submitted by Super-Martingale t3_y4w0sw in MachineLearning
Super-Martingale OP t1_isgacv9 wrote
Reply to comment by hjmb in [D] Suggestions for large-scale company name standardization? by Super-Martingale
In the past, I did fuzzy matching plus a manual selection for smaller lists like a few thousand strings. But for millions of rows, this is just impossible. So we are wondering whether AI-based approaches can help.
hjmb t1_isgaxow wrote
I would be wary - AI approaches tend to give you plausible answers, not true answers. Also it may be worth updating your post to make it clear that you're looking for AI solutions to your problem, rather than looking for data cleaning advice for a dataset that you are going to feed into a machine learning system (which is what I inferred)
Super-Martingale OP t1_isgey5g wrote
There is definitely a tradeoff between accuracy and efficiency. We are not sure which approach would be better, so want to keep the discussion broad.
Viewing a single comment thread. View all comments