osdd_alt_123 t1_jd6ufjz wrote on March 22, 2023 at 7:07 AM Reply to comment by maizeq in [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101 Nvidia has 2:4 structured sparsity in the Ampere architecture and one or two below as well, if memory serves. So in a block of 4, you have to have 2 dropped and 2 retained. It's how they claim their 2x throughput at the hardware level. You can, however, emulate sparsity in a variety of other ways that are higher than the hardware level. Hope this helps. Permalink Parent 5
osdd_alt_123 t1_jd6ufjz wrote
Reply to comment by maizeq in [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101
Nvidia has 2:4 structured sparsity in the Ampere architecture and one or two below as well, if memory serves.
So in a block of 4, you have to have 2 dropped and 2 retained. It's how they claim their 2x throughput at the hardware level.
You can, however, emulate sparsity in a variety of other ways that are higher than the hardware level. Hope this helps.