Viewing a single comment thread. View all comments

Exciting-Engineer646 t1_izd6iej wrote

I hate to ask this, but how well do they understand their data? Bad data, distributions that break assumptions (heavy tails, autocorrelation, etc), missing data, and all of the rest will cause model failures even if they are able to code. If it is worthwhile for your company to use that data then they need to properly resource it.

2

kayhai OP t1_izd6ldt wrote

Yes, I have sufficient domain knowledge to have meaningful conversations with the requesters.

1

Exciting-Engineer646 t1_izd70cq wrote

Not your knowledge, but theirs! If you want this to be fully code free, they probably need to own the data end as well.

I trust very few people on knowing which model their data can support. So even if you can find a code free solution, you still need to find a scalable solution for data curation and model selection.

2

Exciting-Engineer646 t1_izd724x wrote

Not your knowledge, but theirs! If you want this to be fully code free, they probably need to own the data end as well.

I trust very few people on knowing which model their data can support. So even if you can find a code free solution, you still need to find a scalable solution for data curation and model selection.

2

kayhai OP t1_izd8t7g wrote

Oh, I get what you mean, even if we find a code-free solution, we’d still need them to at least understand the requirements on data quality. Unfortunately, not everyone understands the requirements on data quality and I am also hoping the code-free softwares can help with that.

Just for example, Power BI does a VERY basic screening on the data and indicates whether it is good for regression or classification and whether certain features should be excluded from the study (due to low relation).

1