FuB4R32
FuB4R32 t1_j5tdt1c wrote
We use Google cloud buckets + tensorflow - it works well since you can always point a VM to a cloud bucket (e.g. tfrecords) and it just has access to the data. I know you can do something similar in Jax, haven't tried pytorch. It's the same in a Colab notebook. Not sure if you can point it to a cloud location from local machine though but as others are saying the 4090 might not be the best use case (e.g. you can use a TPU in a Colab notebook to get similar performance)
FuB4R32 t1_iw8kirz wrote
Reply to comment by sckuzzle in Making a model predict on the basis of a particular value by ole72444
You could also do this at the input if its hard to edit the training data, e.g. in tensorflow https://www.tensorflow.org/api_docs/python/tf/gather
https://www.tensorflow.org/api_docs/python/tf/math/argmax
Generally, should look into custom operations like this to achieve what you want
FuB4R32 t1_iu45yrl wrote
Reply to comment by sabeansauce in Question about using more than one gpu for deeplearning tasks. by sabeansauce
Yeah I think I understand, e.g. Google cloud has a great deal on K80 especially if you commit to the costs up front. If you have even a handful of mid GPUs it should be faster training anyway since you can achieve a large batch size, but it depends on the details ofc
FuB4R32 t1_iu41a65 wrote
Is this for training or inference? The easiest thing to do is to split up the batch size between multiple GPUs. If you can't even fit batch=1 on a single GPU though then model parallelism is generally a harder problem
FuB4R32 t1_j5v9mlr wrote
Reply to comment by Zealousideal-Copy463 in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
Yeah as long as your VM is in the same region as the bucket it should be fine. Even if you have 200GB it doesn't take that long to move between regions either