Viewing a single comment thread. View all comments

ThatInternetGuy t1_ir9v9aj wrote

Yes, 25% improvement.

My point is, Nvidia CUTLASS has practically improved matrix multiplication by 200% to 900%. Why do you guys think matrix multiplication is currently slow with GPU, I don't get that. The other guy said it's an unsolved problem. There is nothing unsolved when it comes to matrix multiplication. It has been vastly optimized over the years since RTX first came out.

It's apparent that RTX Tensor Cores and CUTLASS have really solved it. It's no coincidence that the recent explosion of ML progresses when Nvidia put in more Tensor Cores and now with CUTLASS templates, all models will benefit from 200% to 900% performance boost.

This RL-designed GEMM is the icing on the cake. Giving that extra 25%.

0

ReginaldIII t1_ir9w5x1 wrote

> It's apparent that RTX Tensor Cores and CUTLASS have really solved it.

You mean more efficiency was achieved using a novel type of hardware implementing a state of the art algorithm?

So if we develop methods for searching for algorithms with even better op requirements, we can work on developing hardware that directly leverages those algorithms.

> Why do you guys think matrix multiplication is currently slow with GPU, I don't get that.

I don't think that. I think that developing new hardware and implementing new algorithms that leverage that hardware is how it gets even faster.

And it's an absurd statement for you to make because it's entirely relative. Go back literally 4 years and you could say the same thing despite how much has happened since.

> This has never been figured out for ages; however, it's up to the debate if the AI could improve the

> The other guy said it's an unsolved problem. There is nothing unsolved when it comes to matrix multiplication. It has been vastly optimized over the years since RTX first came out.

The "other guy" is YOU!

3

ThatInternetGuy t1_ir9zlmq wrote

This is not the first time RL is used to make efficient routings on the silicon wafers and on the circuit boards. This announcement is good but not that good. 25% improvement in the reduction of silicon area.

I thought they discovered a new Tensor Core design that gives at least 100% improvement.

0