Submitted by killver t3_y2vvne in MachineLearning
programmerChilli t1_is7vgbp wrote
Reply to comment by AlmightySnoo in [N] First RTX 4090 ML benchmarks by killver
I mean... it's hard to write efficient matmuls :)
But... recent developments (i.e. CuBLAS and Triton) do allow NN frameworks to write efficient matmuls, so I think you'll start seeing them being used to fuse other operators with them :)
You can already see some of that being done in projects like AITemplate.
I will note one other thing though - fusing operators with matmuls is not as big of a bottleneck in training, this optimization primarily helps in inference.
Viewing a single comment thread. View all comments