Comments

You must log in or register to comment.

lostmsu t1_j4lrt8a wrote

Performance/$ characteristic needs an adjustment based on longevity * utilization * electricity cost. Assuming you are going to use card for 5 years at full load, that's $1000-$1500 in electricity at 1$ per year per 1W of constant use (12c/kWh). This would take care of the laughable notion, that Titan Xp is worth anything, and sort cards much closer to their market positioning.

65

timdettmers t1_j4lvr7i wrote

I like this idea! I already factored in fixed costs for building a desktop computer but the electricity is also an important part of the overall cost especially if you compare it to cloud options.

I am currently gathering feedback to update the post later. I think it's quick to create a chart based on this data and create an update later today.

The main problem to estimate cost is to get a good number on the utilization time of GPUs for the average user. For PhD students, the number was about 15% utilization (fully using a GPU 15% of total time). This means, with an average of 60 watt idle and 350 watt max for a RTX 4090: 60 watt * 0.85 + 350 watt * 0.15=103.5 watt. That is 906 kWh per year or about $210 per year per RTX 4090 (assuming US average is 0.23 cents per kWh).

Does that look good to you?

I think its quick to create a chart based on this data and create an update later today.

Edit: part of this seemed to got lost in editing. Oops! I re-added the missing details.

54

asdfzzz2 t1_j4nl1tn wrote

> This means, with an average of 60 watt idle and 350 watt max for a RTX 4090

RTX 4090 "idles" (stream at background) at 10-15 watt. 4k144hz monitor might change it, but 60 watt is way too much for GPU only.

8

init__27 OP t1_j4mavhs wrote

Oh wow, Great to see you here as well Tim 🙏

​

As a Kaggler, the usage for my case varies extensively, if I end up in a Deep Learning competition, for 1-2 months, the usage usually is around 60-100% I would like to say.

​

I know many top Kagglers that compete year around, I would vaguely guess their usage is the highest in %

4

SearchAtlantis t1_j4ne6jj wrote

For what it's worth I think 15% seems low. Having just finished an MS with Deep Learning in my thesis, over the course of a year I used it about 25% of the time. Test quick shallow for arch and other changes then running arch changes etc at full depths for comparison.

2

anothererrta t1_j4pagpo wrote

If you go to all this trouble, please keep in mind that electricity prices vary a lot across the world. In some places in Europe people pay twice as much as you assumed above.

Making it clear how you arrive at your value calculation in an updated post (or even making it a dynamic calculator where people can enter their cost/kWh) would be very useful.

2

tripple13 t1_j4m7ykq wrote

Well, somehow I expected TD's conclusion to be "Skip the current gen, wait for newer gen"

And yet, here we are.

14

timdettmers t1_j4mfjbw wrote

I thought about making this recommendation, but the next generation of GPUs will not be much better. You probably need to wait until about 2027 for a better GPU to come along. I think for many waiting 4 years for an upgrade might be too long, so I recommend mostly buying now. I think the RTX 40 cards are a pretty good investment that will last a bit longer than previous generations.

19

init__27 OP t1_j4mb9ga wrote

If its any help, u/tripple13 I bought 2 3090s recently at a "discounted" rate (1400$ compared to 3500$ that I paid when they were kings), I'm really happy with it.

​

BUT OFCOURSE I ALSO WISH I HAD THE LATEST & FASTEST ONES :')

7

chief167 t1_j4n7wdv wrote

A little bit dangerous, because the A100 is a beast when you look at performance/kWh. If you run a lot of heavy workloads, it's your best option, but on this chart it looks like the worst

TCO != Purchase price

14

learn-deeply t1_j4na276 wrote

Note: graphs comparing GPUs are not actual benchmarks but theoretical results. Nvidia likes to arbitrarily add restraints to their non-datacenter GPUs, so its not clear what the real-word performance is.

8

numpee t1_j4whr9r wrote

Hi u/timdettmers, I had a great time reading your blog post :) I just wanted to point out something that might be worth mentioning: the issue with 4090 (and probably 4080 as well) is that they won't fit in servers, specifically 4U rack mounted servers. In rack mounted servers, the PCIe slots are placed at the bottom (facing upwards), so the GPUs are placed "vertically" (PCIe pointing downwards). The 4090s are too tall for the 4U server, which makes it unusable (plus, 3.5slots for a single GPU complicates things further).

3

protocolypse t1_j4o8t8d wrote

This and the back blaze report are among my favorite articles.

2

serge_mamian t1_j4ods2g wrote

The question is how the fuck does one get a 4090? I am really at my wits end, Amazon has a few at double MSRP.

2

Epitaque t1_j4qvreg wrote

Don’t know what country you’re in, but it regularly goes in stock during the week on Newegg at ~$1700

1

serge_mamian t1_j4r4soo wrote

In US, check Newegg regularly and its always out of stock. How do you catch it?

1

BeatLeJuce t1_j4m9pp6 wrote

Overall nice, but the article also uses some expressions without ever explaining them. For example: What is H100, and what is A100. Somewhere in the Article, it says that H100=RTX40 cards, somewhere else it says A100 is a RTX40 card. Which is which?

Also, what is TF32? It's an expression that appears in a paragraph without explanation.

0

timdettmers t1_j4mfra6 wrote

This is good feedback. Wanted to make another pass this morning to clean references like this up, but did not have the time. Will try to be more clear about this in the next update (later today, probably).

18

JustOneAvailableName t1_j4my6kp wrote

Great article!

You say this about sparsity:

> It does not seem so. Since the granularity of the sparse matrix needs to have 2 zero-valued elements, every 4 elements, the sparse matrices need to be quite structured.

Wouldn't a more slightly more structured dropout be a perfect fit?

3

Freonr2 t1_j4mvhhf wrote

A100 and H100 are data center GPUs. Very expensive, tuned for training large models. They also use on-package HBM memory instead of GDDR on the board for improved memory bandwidth.

A100 is Ampere, same architecture as the 30xx series, but built for training with a lot more tensor cores and less focus on Cuda cores. Most often seen in SXM form factor in special servers that offers substantially more NVLink bandwidth between GPUs for multi-gpu training (and the special servers the SXM cards go into also have considerable network bandwidth for clustered training). They do make PCIe versions. Does not support FP8. Typical setup is an AGX server with 8xA100. These are a few hundred grand for the whole server, even ignoring the power and network requirements, etc to utilize it.

H100 is Hopper, newer than Ampere, but I don't believe ever made into a consumer part but perhaps closer to Ada (40xx) in features than it is to Ampere (30xx) since it has FP8. It's basically the replacement for A100, much like the 40xx is the replacement for the 30xx. These are again often in HGX server boxes for a several hundred grand. Unsure if there is a PCIe version?

Nvidia removed NVLink from the 40xx series, but its still technically available on 3090s. They're sort of segmenting the market here.

If they decide to release a 4090 with 48GB (or Ada Titan or whatever branding they decide on) it could be a monster card if you only need or want a single card, but it may also be $3k+...

12

init__27 OP t1_j4mb2iy wrote

The author is hanging out and collecting feedback on here. I'm sure he'll correct it in an update.

​

Maybe I'm too in the roots but if I were in the author's shoes I would assume as well that the reader would know of these terms and cards.

7

royalemate357 t1_j4migdx wrote

TF32 is tensorfloat 32, which is a relatively new precision format for newer GPUs. Basically, when doing math, it uses the same number of mantissa as FP16 (10 bits), and the same number of exponent bits as normal float32 (8 bits). more on it here: https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/

3

BeatLeJuce t1_j4p938f wrote

Thanks for the explanation? Why call it TF32 when it apperas to have 19 bits? (IIUC it's bfloat16 with 3 additional bits of mantissa?)

1

royalemate357 t1_j4qdfwj wrote

Tbh I don't think it's an especially good name, but I believe the answer to your question is that it actually uses 32 bits to store a TF32 value in memory. its just that when they pass it into tensor cores to do matmuls, they temporarily downcast it to this 19-bit precision format.

>Dot product computation, which forms the building block for both matrix multiplies and convolutions, rounds FP32 inputs to TF32, computes the products without loss of precision, then accumulates those products into an FP32 output (Figure 1).

(from https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/)

3

mentatf t1_j4pndf8 wrote

"legendary"..? really..?

0