Viewing a single comment thread. View all comments

Zatania OP t1_ir5dmxw wrote

load straight into colab

like as a test, i downloaded 1 gb dataset from kaggle into colab directly

1

Top-Perspective2560 t1_ir5eoku wrote

Try uploading it to your Google Drive first.

Then you can mount your drive in your notebook by using:

from google.colab import drive
drive.mount(“mnt”)

Run the cell and allow access to your Drive when the prompt appears.

In the files tab on the left-hand pane you should now see a folder called mnt listed which will contain the contents of your Google Drive. To get the path to a file you can just right click on the file>copy path.

14

Zatania OP t1_ir5kskz wrote

I'll try this solution if this works, will get back to you.

2

you-get-an-upvote t1_ir9978p wrote

FYI loading many small files from drive is very slow. If this applies to you, I recommend zipping the files, uploading to drive, copying the zipped file onto your colab machine, and unzipping.

from google.colab import drive

drive.mount('/content/drive')

!cp '/content/drive/My Drive/foo.zip' '/tmp/foo.zip'

os.chdir("/tmp")

!unzip -qq 'foo.zip'

Otherwise, if your dataloader is trying to copy files over from Drive one at a time it's going to be really slow.

Also I'd make sure you're not accidentally loading the entire dataset into RAM (assuming your crash is due to lack of RAM?).

2

alesi_97 t1_ir742br wrote

Bad advice

Google Drive access bandwidth is limited and far lower than the Google Colab runtime’s (temporary) HDD storage

Source: worked on training CNN for my bachelor’s thesis

2

Top-Perspective2560 t1_ir86fh8 wrote

It may actually solve the problem. I’ve run into similar issues before.

Source: CompSci PhD. I use Colab a lot.

3

Sonoff t1_ir5dzpa wrote

Well put files in your google drive and mount your drive

3