kilow4tt t1_jd68jnd wrote on March 22, 2023 at 3:02 AM

Reply to [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101

Was there any effort to go from 75% sparsity during pre-training to a less sparse (e.g. 25%) sparsity during fine-tuning rather than strictly going from 75% sparsity to 0%?