Submitted by Kaudinya t3_y25fjb in MachineLearning

Running ML workflows involves several hurdles. You connect to a machine through SSH, install the CUDA driver, fetch your code, copy the data, build a docker image, run the script, watch the process, etc. Finally, if the machine is a cloud instance, stop it.The other alternative is to use end-to-end platforms - open source or enterprise ones.

In an attempt to possibly simplify it, we open-sourced a tool that allows running ML workflows from CLI but they would actually run in the cloud and takes care of - provisioning infrastructure, setting up the environment, etc. Would be glad to get your feedback on the project [github.com/dstackai/dstack]. See the link in the comment. Many thanks

14

Comments

You must log in or register to comment.

mietminderung t1_is152di wrote

Looks similar to DVC. A comparison to what is different from existing tools would be nice.

3

Kaudinya OP t1_is1aapx wrote

thanks. Indeed there is some similarity with DVC in regards to the simplicity of the tool. But in fact, the tool has different use cases to solve. DVC manages data as the main use case. dstack on the other hand focuses on helping you provision infra on-demand to run ML workflows - think terraform for ML - in the cloud as if you did it locally.

2

cheptsov t1_is1g63l wrote

Hey, the creator of dstack here.
Love DVC and other tools by Iterative.ai. Actually, was inspired originally by DVC and CML when I only started working on dstack.
As u/Kaudinya mentioned, dstack focuses on provisioning infrastructure and environment in the cloud.
On the other hand, dstack also helps manage data but doesn't use Git for that.
See https://docs.dstack.ai/examples/artifacts/ and https://docs.dstack.ai/examples/deps/

1

mietminderung t1_is1hvel wrote

> dstack focuses on provisioning infrastructure and environment in the cloud.

> See https://docs.dstack.ai/examples/artifacts/ and https://docs.dstack.ai/examples/deps/

DVC also does "provision infra and environment in the cloud" based on your examples. Again, a comparison to specific similarites and differences would be best.

See https://dvc.org/doc/user-guide/pipelines/defining-pipelines

1

cheptsov t1_is1lezk wrote

/u/mietminderung I don't really want to argue but DVC doesn't provision infrastructure ;) When you run things via DVC, they run locally. When you run things via dstack, they run in the configured cloud account.

3

mietminderung t1_is1oprk wrote

I don’t want to argue either. Probably, you need to be extremely specific about “provision of infrastructure” means ;)

1

cheptsov t1_is1p9np wrote

Yup. Basically, dstack allows you to run ML workflows in the cloud as if you did it locally. For example, you can specify how many GPUs you need or how much RAM and dstack will automatically create a cloud instance that satisfies the requirements to run the workflow.

4

BernieFeynman t1_is42mlg wrote

All of this is readily available already on AWS, GCP, and probably Azure.

2

cheptsov t1_is4i031 wrote

I believe you mean that AWS, GCP, and Azure have their own tools to provision infrastructure for ML workflows. Yes, they do.

dstack offers something that none of the cloud vendors offer – a light-weight and developer-friendly CLI that is integrated with Git and can be used from the IDE.

Basically, dstack is a light-weight and developer-friendly alternative to the end-to-end MLOps platform.

3

SatoshiNotMe t1_is4znri wrote

Will this run on a cloud GPU provider (Jarvis, Lambda, TensorDock…)?

1

Kaudinya OP t1_is5bb2a wrote

Currently, dstack only supports AWS but we are considering supporting TensorDock. Do you have any opinion on what to support first?

1

Doubleve75 t1_is19qw2 wrote

Giggso is launching no code model ops that you can use without cli

−1

Kaudinya OP t1_is1adij wrote

Thanks. That is good to know. Will have a look.

1