Submitted by [deleted] t3_zfp3g9 in MachineLearning
VirtualHat t1_izfu724 wrote
Reply to comment by 1bir in [D] Workflows for quickly iterating over ideas without free access to super computers by [deleted]
I use three scripts.
train.py (which trains my model)
worker.py (which picks up the next job and runs it using train.py)
runner.py (which is basically a list of jobs and code to display what's happening).
I then have multiple machines running multiple instances of worker.py. When a new job is created, the workers see it and start processing it. Work is broken into 5-epoch blocks, and at the end of each block, a new job from the priority queue is selected.
This way I can simply add a new job and within 30 minutes or so one of the workers will finish its current block and pick it up. Also because of the chunking, I get early results on all the jobs rather than having to wait for them to finish. This is important as I often know early on if it's worth finishing or not.
I evaluate the results in a Jupyter notebook using the logs that each job creates.
edit: fixed links.
moyle t1_izgsce9 wrote
Guild.ai can easily automate this pocess. I really recommend checking it out
VirtualHat t1_izgvx9j wrote
This looks great.
RSchaeffer t1_izgxqod wrote
These links don't work for me. Can you double check them?
thundergolfer t1_izgyu6x wrote
They're not actually links, they've just been formatted like they are. They just link to train.py
which is not a website.
VirtualHat t1_izjmbm0 wrote
Oh my bad, didn't realise Reddit automatically created links when writing abc.xyz. I've edited the reply to include links to my code.
Viewing a single comment thread. View all comments