kilow4tt t1_jd68jnd wrote on March 22, 2023 at 3:02 AM Reply to [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101 Was there any effort to go from 75% sparsity during pre-training to a less sparse (e.g. 25%) sparsity during fine-tuning rather than strictly going from 75% sparsity to 0%? Permalink 6
kilow4tt t1_jd68jnd wrote
Reply to [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101
Was there any effort to go from 75% sparsity during pre-training to a less sparse (e.g. 25%) sparsity during fine-tuning rather than strictly going from 75% sparsity to 0%?