Submitted by nateharada t3_10do40p in MachineLearning
nateharada OP t1_j4otocf wrote
Reply to comment by Fit_Schedule5951 in [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada
This tool actually doesn't look at memory right now, just actual computation. Usually loading your model into memory eats up basically the max memory until the training is done, even if compute usage is very low.
If your training is hanging and still burning GPU cycles that'd be harder to detect I think.
bay_der t1_j4papbd wrote
One way I have figured out is to put a watch on the log file.
Viewing a single comment thread. View all comments