Submitted by Troll_of_the_bridge t3_y3zm5p in deeplearning

Hello all,

I've been looking at using a small neural network (~500 trainable weights) implemented in PyTorch to solve a regression problem where my features and targets are originally stored to double precision. I've experimented with setting both my features + targets and my NN weights to single and double precision, but have noticed only a negligible difference in the time it takes to train the model over a fixed number of epochs. I've found this to be true when training on Cpu (AMD Ryzen 9 5900X 12-Core Processor 3.70 GHz) or Gpu (RTX 2070 Super, CUDA 11.7).

I've also done this experiment for training a MLP on the Fashion MNIST data set exactly as described in the PyTorch quickstart tutorial (https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html). In this case, I have also found that the per-epoch training time is only decreased from 5.1 to 4.8 seconds when training a model with double vs. single precision weights on my Gpu.

I'm wondering why I don't see a more significant training time difference using these different precisions? Any insight would be appreciated!

2

Comments

You must log in or register to comment.

suflaj t1_isbgg32 wrote

Probably because the startup overhead dominates over the processing time. 500 weights is not really something you can apply to real life, as modern neural networks are 100+ million parameters for consumer hardware, and not on a dataset which is considered solved.

3

_Arsenie_Boca_ t1_isbgyhs wrote

If the hardware is optimized for it, there probably is not a huge difference in speed, but the performance gain is probably negligible too.

The real reason people dont use 64bit is mainly memory usage. When you train a large model, you can fit much bigger 32bit/16bit batches into memory and thereby speed up training.

3

Karyo_Ten t1_iscf3fp wrote

There is no way you are using 64-bit on the GPU.

All the CuDNN code is 32-bit for the very simple reason that non-Tesla GPUs have between 1/32 to 1/64 FP64 throughput compared to FP32.

See https://www.reddit.com/r/CUDA/comments/iyrhuq/comment/g93reth/

So under the hood your FP64 stuff is converted to FP32 when sent to GPU.

And on Tesla GPUs the ratio is 1/2.

3

sutlusalca t1_isbclm9 wrote

Story: You went to the supermarket and bought a bottle of milk. Next day, you went again and bought two bottles of milk. You just spent a few seconds more for buying one more milk.

2