programmerChilli t1_is7vgbp wrote on October 13, 2022 at 10:57 PM

Reply to comment by AlmightySnoo in [N] First RTX 4090 ML benchmarks by killver

I mean... it's hard to write efficient matmuls :)

But... recent developments (i.e. CuBLAS and Triton) do allow NN frameworks to write efficient matmuls, so I think you'll start seeing them being used to fuse other operators with them :)

You can already see some of that being done in projects like AITemplate.

I will note one other thing though - fusing operators with matmuls is not as big of a bottleneck in training, this optimization primarily helps in inference.