Viewing a single comment thread. View all comments

Jean-Porte t1_ja9ejvo wrote

You can increase some timeout parameter, it helps

But I agree, I don't even understand why they don't log things locally when failing instead of KILLING A ONE WEEK JOB ON A HIGH END GPU SERVER ( MORE THAN 100$ WORTH OF COMPUTE TIME)

10

not_particulary OP t1_ja9g6o1 wrote

Yeah but it's super iffy. My exact script works most of the time, so idk even what to fix. That's why I just want to use something else, the software is obviously not stable.

2

Jean-Porte t1_ja9iik5 wrote

>Yeah but it's super iffy. My exact script works most of the time, so idk even what to fix. That's why I just want to use something else, the software is obviously not stabl

Do `export WANDB__SERVICE_WAIT=300`

I don't have that problem anymore

5