Viewing a single comment thread. View all comments

chengstark t1_izygwqn wrote

In academia we usually have the data already labeled, but I did one unfortunate project where the annotation is absolutely garbage (too many mistakes). Ensuring the correctness of labeling should be one of the priorities. From my limited experience you would want collaborators with domain knowledge of the data to make sure the processing is absolutely correct.

Recent developments in self supervised learning and generalized pretrained big models may lower the amount of labeled samples needed, not sure what that would affect your product, but it seems related.

1