Submitted by CS-fan-101 t3_11xskuk in MachineLearning
maizeq t1_jd76a7x wrote
Reply to comment by osdd_alt_123 in [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101
Ah I see, thank you for the clarification.
brownmamba94 t1_jd8lqry wrote
Also, the N:M sparsity structure is much more constrained in terms of mask diversity compared to unstructured sparsity. In Table 1 in the N:M Transposable sparsity paper, they present the mask diversity constraint between different sparsity techniques (both unstructured and structured), and as expected unstructured sparsity achieves the best. I think this is important especially for dynamic sparse training because now the algorithm has a much larger search space to explore sparse subnetworks. Also, imposing structured sparsity like N:M sparsity tends to reduce the expressivity of a weight matrix at higher sparsity levels, which can be a constraint if you want to get high compression ratios.
Viewing a single comment thread. View all comments