Submitted by quantifiedvagabond t3_y992gf in MachineLearning
My ML team is looking to buy/source a dataset of videos of people performing certain niche tasks to train a business-critical model. From our research, it seems like Scale AI, Toloka, Appen, Defined AI, and Clickworker offer solutions in that space.
Has anyone used any of these before and would recommend (or recommend avoiding) them? Are we better off just running the crowdsourcing of the data in-house?
suflaj t1_it4no84 wrote
If you have the means to record the dataset in house it's the best way. You can directly talk to the annotators and the subjects, you make sure that this data cannot be redistributed unless someone leaks it, and you will have a better grasp regarding privacy policies. It is also likely to be cheaper.
With external data it is almost impossible to prove you are allowed to have it, and this data can then just be resold to someone else, potentially a competitor.