hellrail

hellrail t1_iskgwjz wrote

No, why should it.

This densification can make it easier to reach a generalizing training state, but the generalized state probably performs worse than a well generalized state without the augmentation as it changes the distribution to learn slightly by artificially imposing that a portion of the points are the center of mass of a triangulation of another portion of points. That is not generally the case for sensor data that will come in, therefore the modified distribution has low relevance to the real distribution that one wants to learn.

3

hellrail t1_isjxlxf wrote

U need to find a method tobturn these names into a feature vector, such that in feature space similar names ate clustered together naturally. Start with standard string similarities to get the feature vector, if that does not result in sufficiently unambigious cluster formations proceed by lemmatization methods and if it still is not sufficient try out some prelearned mod ls to generate the feature encoding

2