FuB4R32 t1_j5v9mlr wrote on January 25, 2023 at 8:25 PM

Reply to comment by Zealousideal-Copy463 in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463

Yeah as long as your VM is in the same region as the bucket it should be fine. Even if you have 200GB it doesn't take that long to move between regions either

FuB4R32 t1_j5tdt1c wrote on January 25, 2023 at 1:01 PM

Reply to Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463

We use Google cloud buckets + tensorflow - it works well since you can always point a VM to a cloud bucket (e.g. tfrecords) and it just has access to the data. I know you can do something similar in Jax, haven't tried pytorch. It's the same in a Colab notebook. Not sure if you can point it to a cloud location from local machine though but as others are saying the 4090 might not be the best use case (e.g. you can use a TPU in a Colab notebook to get similar performance)

FuB4R32 t1_iw8kirz wrote on November 13, 2022 at 7:56 PM

Reply to comment by sckuzzle in Making a model predict on the basis of a particular value by ole72444

You could also do this at the input if its hard to edit the training data, e.g. in tensorflow https://www.tensorflow.org/api_docs/python/tf/gather

https://www.tensorflow.org/api_docs/python/tf/math/argmax

Generally, should look into custom operations like this to achieve what you want

FuB4R32 t1_iu45yrl wrote on October 28, 2022 at 12:47 PM

Reply to comment by sabeansauce in Question about using more than one gpu for deeplearning tasks. by sabeansauce

Yeah I think I understand, e.g. Google cloud has a great deal on K80 especially if you commit to the costs up front. If you have even a handful of mid GPUs it should be faster training anyway since you can achieve a large batch size, but it depends on the details ofc

FuB4R32 t1_iu41a65 wrote on October 28, 2022 at 12:05 PM

Reply to Question about using more than one gpu for deeplearning tasks. by sabeansauce

Is this for training or inference? The easiest thing to do is to split up the batch size between multiple GPUs. If you can't even fit batch=1 on a single GPU though then model parallelism is generally a harder problem