VirtualHat t1_izfu724 wrote on December 8, 2022 at 8:15 PM

Reply to comment by 1bir in [D] Workflows for quickly iterating over ideas without free access to super computers by [deleted]

I use three scripts.

train.py (which trains my model)

worker.py (which picks up the next job and runs it using train.py)

runner.py (which is basically a list of jobs and code to display what's happening).

I then have multiple machines running multiple instances of worker.py. When a new job is created, the workers see it and start processing it. Work is broken into 5-epoch blocks, and at the end of each block, a new job from the priority queue is selected.

This way I can simply add a new job and within 30 minutes or so one of the workers will finish its current block and pick it up. Also because of the chunking, I get early results on all the jobs rather than having to wait for them to finish. This is important as I often know early on if it's worth finishing or not.

I evaluate the results in a Jupyter notebook using the logs that each job creates.

edit: fixed links.

moyle t1_izgsce9 wrote on December 9, 2022 at 12:10 AM

Guild.ai can easily automate this pocess. I really recommend checking it out

VirtualHat t1_izgvx9j wrote on December 9, 2022 at 12:38 AM

This looks great.

RSchaeffer t1_izgxqod wrote on December 9, 2022 at 12:52 AM

These links don't work for me. Can you double check them?

thundergolfer t1_izgyu6x wrote on December 9, 2022 at 1:01 AM

They're not actually links, they've just been formatted like they are. They just link to train.py which is not a website.

VirtualHat t1_izjmbm0 wrote on December 9, 2022 at 4:21 PM

Oh my bad, didn't realise Reddit automatically created links when writing abc.xyz. I've edited the reply to include links to my code.