lostmsu t1_j4lrt8a wrote on January 16, 2023 at 4:54 PM

#1,377,248

Performance/$ characteristic needs an adjustment based on longevity * utilization * electricity cost. Assuming you are going to use card for 5 years at full load, that's $1000-$1500 in electricity at 1$ per year per 1W of constant use (12c/kWh). This would take care of the laughable notion, that Titan Xp is worth anything, and sort cards much closer to their market positioning.

timdettmers t1_j4lvr7i wrote on January 16, 2023 at 5:19 PM

#1,377,459

Replying to lostmsu (#1,377,248)

I like this idea! I already factored in fixed costs for building a desktop computer but the electricity is also an important part of the overall cost especially if you compare it to cloud options.

I am currently gathering feedback to update the post later. I think it's quick to create a chart based on this data and create an update later today.

The main problem to estimate cost is to get a good number on the utilization time of GPUs for the average user. For PhD students, the number was about 15% utilization (fully using a GPU 15% of total time). This means, with an average of 60 watt idle and 350 watt max for a RTX 4090: 60 watt * 0.85 + 350 watt * 0.15=103.5 watt. That is 906 kWh per year or about $210 per year per RTX 4090 (assuming US average is 0.23 cents per kWh).

Does that look good to you?

I think its quick to create a chart based on this data and create an update later today.

Edit: part of this seemed to got lost in editing. Oops! I re-added the missing details.

tripple13 t1_j4m7ykq wrote on January 16, 2023 at 6:34 PM

#1,378,035

Well, somehow I expected TD's conclusion to be "Skip the current gen, wait for newer gen"

And yet, here we are.

BeatLeJuce t1_j4m9pp6 wrote on January 16, 2023 at 6:44 PM

#1,378,133

Overall nice, but the article also uses some expressions without ever explaining them. For example: What is H100, and what is A100. Somewhere in the Article, it says that H100=RTX40 cards, somewhere else it says A100 is a RTX40 card. Which is which?

Also, what is TF32? It's an expression that appears in a paragraph without explanation.

init__27 OP t1_j4mavhs wrote on January 16, 2023 at 6:51 PM

#1,378,193

Replying to timdettmers (#1,377,459)

Oh wow, Great to see you here as well Tim 🙏

As a Kaggler, the usage for my case varies extensively, if I end up in a Deep Learning competition, for 1-2 months, the usage usually is around 60-100% I would like to say.

I know many top Kagglers that compete year around, I would vaguely guess their usage is the highest in %

init__27 OP t1_j4mb2iy wrote on January 16, 2023 at 6:53 PM

#1,378,209

Replying to BeatLeJuce (#1,378,133)

The author is hanging out and collecting feedback on here. I'm sure he'll correct it in an update.

Maybe I'm too in the roots but if I were in the author's shoes I would assume as well that the reader would know of these terms and cards.

init__27 OP t1_j4mb9ga wrote on January 16, 2023 at 6:54 PM

#1,378,223

Replying to tripple13 (#1,378,035)

If its any help, u/tripple13 I bought 2 3090s recently at a "discounted" rate (1400$ compared to 3500$ that I paid when they were kings), I'm really happy with it.

BUT OFCOURSE I ALSO WISH I HAD THE LATEST & FASTEST ONES :')

timdettmers t1_j4mfjbw wrote on January 16, 2023 at 7:20 PM

#1,378,443

Replying to tripple13 (#1,378,035)

I thought about making this recommendation, but the next generation of GPUs will not be much better. You probably need to wait until about 2027 for a better GPU to come along. I think for many waiting 4 years for an upgrade might be too long, so I recommend mostly buying now. I think the RTX 40 cards are a pretty good investment that will last a bit longer than previous generations.

timdettmers t1_j4mfra6 wrote on January 16, 2023 at 7:22 PM

#1,378,465

Replying to BeatLeJuce (#1,378,133)

This is good feedback. Wanted to make another pass this morning to clean references like this up, but did not have the time. Will try to be more clear about this in the next update (later today, probably).

royalemate357 t1_j4migdx wrote on January 16, 2023 at 7:39 PM

#1,378,612

Replying to BeatLeJuce (#1,378,133)

TF32 is tensorfloat 32, which is a relatively new precision format for newer GPUs. Basically, when doing math, it uses the same number of mantissa as FP16 (10 bits), and the same number of exponent bits as normal float32 (8 bits). more on it here: https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/

Freonr2 t1_j4mvhhf wrote on January 16, 2023 at 9:00 PM

#1,379,284

Replying to BeatLeJuce (#1,378,133)

A100 and H100 are data center GPUs. Very expensive, tuned for training large models. They also use on-package HBM memory instead of GDDR on the board for improved memory bandwidth.

A100 is Ampere, same architecture as the 30xx series, but built for training with a lot more tensor cores and less focus on Cuda cores. Most often seen in SXM form factor in special servers that offers substantially more NVLink bandwidth between GPUs for multi-gpu training (and the special servers the SXM cards go into also have considerable network bandwidth for clustered training). They do make PCIe versions. Does not support FP8. Typical setup is an AGX server with 8xA100. These are a few hundred grand for the whole server, even ignoring the power and network requirements, etc to utilize it.

H100 is Hopper, newer than Ampere, but I don't believe ever made into a consumer part but perhaps closer to Ada (40xx) in features than it is to Ampere (30xx) since it has FP8. It's basically the replacement for A100, much like the 40xx is the replacement for the 30xx. These are again often in HGX server boxes for a several hundred grand. Unsure if there is a PCIe version?

Nvidia removed NVLink from the 40xx series, but its still technically available on 3090s. They're sort of segmenting the market here.

If they decide to release a 4090 with 48GB (or Ada Titan or whatever branding they decide on) it could be a monster card if you only need or want a single card, but it may also be $3k+...

JustOneAvailableName t1_j4my6kp wrote on January 16, 2023 at 9:17 PM

#1,379,428

Replying to timdettmers (#1,378,465)

Great article!

You say this about sparsity:

> It does not seem so. Since the granularity of the sparse matrix needs to have 2 zero-valued elements, every 4 elements, the sparse matrices need to be quite structured.

Wouldn't a more slightly more structured dropout be a perfect fit?

chief167 t1_j4n7wdv wrote on January 16, 2023 at 10:19 PM

#1,379,898

A little bit dangerous, because the A100 is a beast when you look at performance/kWh. If you run a lot of heavy workloads, it's your best option, but on this chart it looks like the worst

TCO != Purchase price

learn-deeply t1_j4na276 wrote on January 16, 2023 at 10:33 PM

#1,380,005

Note: graphs comparing GPUs are not actual benchmarks but theoretical results. Nvidia likes to arbitrarily add restraints to their non-datacenter GPUs, so its not clear what the real-word performance is.

SearchAtlantis t1_j4ne6jj wrote on January 16, 2023 at 11:00 PM

#1,380,264

Replying to timdettmers (#1,377,459)

For what it's worth I think 15% seems low. Having just finished an MS with Deep Learning in my thesis, over the course of a year I used it about 25% of the time. Test quick shallow for arch and other changes then running arch changes etc at full depths for comparison.

asdfzzz2 t1_j4nl1tn wrote on January 16, 2023 at 11:47 PM

#1,380,632

Replying to timdettmers (#1,377,459)

> This means, with an average of 60 watt idle and 350 watt max for a RTX 4090

RTX 4090 "idles" (stream at background) at 10-15 watt. 4k144hz monitor might change it, but 60 watt is way too much for GPU only.

Alarmed_Syrup2670 t1_j4o39kr wrote on January 17, 2023 at 1:58 AM

#1,381,698

一一jv一一i17177

protocolypse t1_j4o8t8d wrote on January 17, 2023 at 2:36 AM

#1,382,010

This and the back blaze report are among my favorite articles.

serge_mamian t1_j4ods2g wrote on January 17, 2023 at 3:11 AM

#1,382,280

The question is how the fuck does one get a 4090? I am really at my wits end, Amazon has a few at double MSRP.

BeatLeJuce t1_j4p8p5g wrote on January 17, 2023 at 8:08 AM

#1,383,729

Replying to Freonr2 (#1,379,284)

thanks!

BeatLeJuce t1_j4p938f wrote on January 17, 2023 at 8:13 AM

#1,383,748

Replying to royalemate357 (#1,378,612)

Thanks for the explanation? Why call it TF32 when it apperas to have 19 bits? (IIUC it's bfloat16 with 3 additional bits of mantissa?)

anothererrta t1_j4pagpo wrote on January 17, 2023 at 8:31 AM

#1,383,815

Replying to timdettmers (#1,377,459)

If you go to all this trouble, please keep in mind that electricity prices vary a lot across the world. In some places in Europe people pay twice as much as you assumed above.

Making it clear how you arrive at your value calculation in an updated post (or even making it a dynamic calculator where people can enter their cost/kWh) would be very useful.

mentatf t1_j4pndf8 wrote on January 17, 2023 at 11:27 AM

#1,384,387

"legendary"..? really..?

JustOneAvailableName t1_j4pqxnx wrote on January 17, 2023 at 12:08 PM

#1,384,556

Replying to lostmsu (#1,377,248)

> (12c/kWh)

I wish. It is 6x more expensive here...

royalemate357 t1_j4qdfwj wrote on January 17, 2023 at 3:15 PM

#1,385,941

Replying to BeatLeJuce (#1,383,748)

Tbh I don't think it's an especially good name, but I believe the answer to your question is that it actually uses 32 bits to store a TF32 value in memory. its just that when they pass it into tensor cores to do matmuls, they temporarily downcast it to this 19-bit precision format.

>Dot product computation, which forms the building block for both matrix multiplies and convolutions, rounds FP32 inputs to TF32, computes the products without loss of precision, then accumulates those products into an FP32 output (Figure 1).

(from https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/)

Epitaque t1_j4qvreg wrote on January 17, 2023 at 5:14 PM

#1,387,099

Replying to serge_mamian (#1,382,280)

Don’t know what country you’re in, but it regularly goes in stock during the week on Newegg at ~$1700

serge_mamian t1_j4r4soo wrote on January 17, 2023 at 6:09 PM

#1,387,655

Replying to Epitaque (#1,387,099)

In US, check Newegg regularly and its always out of stock. How do you catch it?

lostmsu t1_j4r942j wrote on January 17, 2023 at 6:35 PM

#1,387,893

Replying to timdettmers (#1,377,459)

Do you mind if I use your data to make a webpage similar to https://diskprices.com/ ?

numpee t1_j4whr9r wrote on January 18, 2023 at 7:14 PM

#1,398,564

Hi u/timdettmers, I had a great time reading your blog post :) I just wanted to point out something that might be worth mentioning: the issue with 4090 (and probably 4080 as well) is that they won't fit in servers, specifically 4U rack mounted servers. In rack mounted servers, the PCIe slots are placed at the bottom (facing upwards), so the GPUs are placed "vertically" (PCIe pointing downwards). The 4090s are too tall for the 4U server, which makes it unusable (plus, 3.5slots for a single GPU complicates things further).

[D] Tim Dettmers' GPU advice blog updated for 4000 series

Comments

lostmsu t1_j4lrt8a wrote on January 16, 2023 at 4:54 PM

timdettmers t1_j4lvr7i wrote on January 16, 2023 at 5:19 PM

tripple13 t1_j4m7ykq wrote on January 16, 2023 at 6:34 PM

BeatLeJuce t1_j4m9pp6 wrote on January 16, 2023 at 6:44 PM

init__27 OP t1_j4mavhs wrote on January 16, 2023 at 6:51 PM

init__27 OP t1_j4mb2iy wrote on January 16, 2023 at 6:53 PM

init__27 OP t1_j4mb9ga wrote on January 16, 2023 at 6:54 PM

timdettmers t1_j4mfjbw wrote on January 16, 2023 at 7:20 PM

timdettmers t1_j4mfra6 wrote on January 16, 2023 at 7:22 PM

royalemate357 t1_j4migdx wrote on January 16, 2023 at 7:39 PM

Freonr2 t1_j4mvhhf wrote on January 16, 2023 at 9:00 PM

JustOneAvailableName t1_j4my6kp wrote on January 16, 2023 at 9:17 PM

chief167 t1_j4n7wdv wrote on January 16, 2023 at 10:19 PM

learn-deeply t1_j4na276 wrote on January 16, 2023 at 10:33 PM

SearchAtlantis t1_j4ne6jj wrote on January 16, 2023 at 11:00 PM

asdfzzz2 t1_j4nl1tn wrote on January 16, 2023 at 11:47 PM

Alarmed_Syrup2670 t1_j4o39kr wrote on January 17, 2023 at 1:58 AM

protocolypse t1_j4o8t8d wrote on January 17, 2023 at 2:36 AM

serge_mamian t1_j4ods2g wrote on January 17, 2023 at 3:11 AM

BeatLeJuce t1_j4p8p5g wrote on January 17, 2023 at 8:08 AM

BeatLeJuce t1_j4p938f wrote on January 17, 2023 at 8:13 AM

anothererrta t1_j4pagpo wrote on January 17, 2023 at 8:31 AM

mentatf t1_j4pndf8 wrote on January 17, 2023 at 11:27 AM

JustOneAvailableName t1_j4pqxnx wrote on January 17, 2023 at 12:08 PM

royalemate357 t1_j4qdfwj wrote on January 17, 2023 at 3:15 PM

Epitaque t1_j4qvreg wrote on January 17, 2023 at 5:14 PM

serge_mamian t1_j4r4soo wrote on January 17, 2023 at 6:09 PM

lostmsu t1_j4r942j wrote on January 17, 2023 at 6:35 PM

numpee t1_j4whr9r wrote on January 18, 2023 at 7:14 PM