Submitted by Dazzling_Koala6834 t3_zkn9jb in MachineLearning
Hey Reddit,
My friend and I are building a project management platform for AI/data science teams (essentially a JIRA for ML). We aim to develop a data-centric, experimental tool that models the ML pipeline to organize workflows, building off the Agile methodology of software development. Our tool will allow ML engineers to design, track, and manage custom pipelines, data flows, and models all on the cloud. Below of a list of some features we plan to introduce:
Integrations: Include a host of integrations to MLOps tools (KubeFlow, MLFlow, etc), cloud computing services (AWS, Google Cloud, Azure), source code management (Github, Bitbucket)
Iterations: Allow multiple iterations within pipelines, and separate each iteration by various steps in the ML pipeline (business understanding, data visualization, data pre-processing, model training, model testing, model optimization, and deployment). Include a Kanban chart per each part of the pipeline
Callbacks: The ability to request to go back to previous stages of the AI pipeline to either improve previous steps (like data preprocessing or model training/development/designing) or request other teams to improve previous steps (we refer to this as callbacks)
Storage: A cloud storage solution to store ML models, datasets, or any other metrics/graphs/whatever ML engineers want to store.
Sketchpad: A sketchpad to design data flows and ML models, and link them to code Private Assignment: The ability to individually/uniquely assign tasks to different roles in a team, and the ability to be able to privately and specifically send vital information to specific people. for example, the pm could only send the data set to the data engineer, the preprocessed data to an ML engineer (potentially added on top of all this is a differential privacy layer), and send the packaged model to an integration engineer.
Chat: A chat/communication platform to interact w/ your team Quantitative Focus: ML is quantitative. The client wants QUANTITATIVE results. Hence, the epic should be emphasized on being quantitative rather than qualitative.
Experiments: We redefine “sprints” as “experiments.” We make two changes to sprints. First, we DO NOT have any deadlines on any sprints. This is to not put the engineer under pressure. Secondly, instead of asking “what”, we ask “how” when asked to describe the experiment. This provides a heavily qualitative focus on the experiments, with a focus on function rather than immediate deliverability as in software engineering.
We would appreciate any feedback on our platform, as well as any problems you guys are facing in data science/ML project management.
Thanks a bunch in advance!
bmrheijligers t1_j00xz4x wrote
Head of data science here. Noble attempt. For a real world use case, reach out to me. Our daily workflow is a long way removed from the idealized image you sketch here.