Submitted by angkhandelwal749 t3_10lxwgd in MachineLearning
Versioning and collaboration on code for software engineers is a reasonably solved problem through GitHub since the task at hand predominantly involves just maintaining different copies of just simple vanilla code in different folders. On the other hand, ML engineers face the humungous task of maintaining different versions on not just code, but hyper parameters, data, models, data lineage and labels and storing this on GitHub currently does not allow you to track the changes on each variable well. What are the software/open source tools currently used for the same? Is their a space for a new company to be built here?
Delicious-View-8688 t1_j60s6lt wrote
git for versioning code
dvc for versioning data (and other ML things)
mlflow for managing ml pipelines (overlaps with some parts of dvc)
conda for environment management (yes, it can be slow...)