Submitted by DisWastingMyTime t3_zj6tkm in MachineLearning
RuairiSpain t1_izunqwy wrote
Data Science is different from development, the agile methodologies don't apply because it's more split into 2-3-4 stages:
- a discovery stage, where you hone in on the question you want to ask, where you get sample data from that simplified the actual data you'll work on
- Which ML algorithm or strategy is close to answering your question with the sample data you have
- Setup training, validation and test samples. Then validation that they are representative of you real data
- Run the model and iterate over the process to improve results. Maybe use hyper parameters optimisation to come up with best results for your lose function.
- Present your result for peer review
- Refactor your model for performance and deployment
There is a lot of data science preamble before you get to a peer review. So quick feedback loops are different compared to software development. The discovery phase is more about understanding the data and extracting the appropriate features that should be tested. It's mostly about applying stats to your data, that then gives you hints about which ML modeling to choose from. See this article on stats: https://towardsdatascience.com/10-machine-learning-methods-that-every-data-scientist-should-know-3cc96e0eeee9
The developer stage is more at the tail end where you look at refactoring the algorithm to make it as fast and explainable as possible. Maybe also add a feedback loops in production to check for model drift, that's where your agile frameworks would potentially be used.
Viewing a single comment thread. View all comments