Submitted by jsonathan t3_y5h8i4 in MachineLearning
hellrail t1_isjl10j wrote
Well first the augmentation is totallly correlated with the original points, therefore they absolutely do not add any new information. Secondly, that approach enlarges the input size, typically one wants the opposite.
Therefore i say densifying pcls artifically for training purposes is nonsense
kakhaev t1_isjpe62 wrote
Your first point seems reasonable but not obvious for me, I would be convinced if model trained with augmented point clouds will perform better then one without it.
And not like we use all points in our model. For example for object detection from lidar you need a way to make their number variable, because in each iteration you will get different number of points from senior, of course you can do preprocessing, but I hope you got the point.
Usually augmentation allow you to increase sample of your input/output space that will lead to better map function that your model will learn.
I also have problem with that interpolation that OP uses is linear, but no one stopping you from modifying code yourself if necessary.
VaporSprite t1_isk3ryn wrote
Correct me if I'm wrong, I'm far from an expert, but couldn't training a model with more data which doesn't inherently add information potentially lead to overfitting?
hellrail t1_iskgwjz wrote
No, why should it.
This densification can make it easier to reach a generalizing training state, but the generalized state probably performs worse than a well generalized state without the augmentation as it changes the distribution to learn slightly by artificially imposing that a portion of the points are the center of mass of a triangulation of another portion of points. That is not generally the case for sensor data that will come in, therefore the modified distribution has low relevance to the real distribution that one wants to learn.
hellrail t1_iskhso6 wrote
@ Usually augmentation allow you to increase sample of your input/output space that will lead to better map function that your model will learn.
More data better results in general yes, but if the additional data is worthless, its a bit scam. That will be recognized in a comparison with an equally well trained state without that augmentation (might be harder to reach) tested on relevant data.
Technically put: the learned distribution is altered to a surrogate pointcloud which is quite similar to the relevant distribution of sensor data that will be produced measuring the real world, but is not the same anymore. Thats the price for more training data with this, and i wouldnt pay it because my primary goal is to capture the relevant distribution as Close as possible.
dingdongkiss t1_isl2pco wrote
Yeah densifying seems pointless if production inference data is gonna be as spare as the inputs into this. Estimating the distribution of points and sampling seems more useful
Viewing a single comment thread. View all comments