osdd_alt_123 t1_jd6ufjz wrote on March 22, 2023 at 7:07 AM

Nvidia has 2:4 structured sparsity in the Ampere architecture and one or two below as well, if memory serves.

So in a block of 4, you have to have 2 dropped and 2 retained. It's how they claim their 2x throughput at the hardware level.

You can, however, emulate sparsity in a variety of other ways that are higher than the hardware level. Hope this helps.