Submitted by CS-fan-101 t3_11yzsz6 in MachineLearning
Note #2: We are revising the name to Sparse-IFT. We appreciate the candid feedback and look forward to hearing any additional feedback you have on our research.
Note: Thank you r/MachineLearning for providing so many awesome naming alternatives! We'll revisit the acronym and update accordingly.
We are excited to announce the availability of our paper on arxiv on Sparse Iso-FLOP Transformations (Sparse-IFT), which increases accuracy and maintains the same FLOPs as the dense model using sparsity. In this research, we replace dense layers with Sparse-IFT and significantly improve computer vision and natural language processing tasks without modifying training hyperparameters
Some of the highlights of this work include ResNet-18 on ImageNet achieving a 3.5% accuracy improvement and GPT-3 Small on WikiText-103 reducing perplexity by 0.4, both matching larger dense model variants that have 2x or more FLOPs.
Sparse-IFT is simple to use, provides a larger search space to find optimal sparse masks, and is parameterized by a single hyperparameter - the sparsity level.
This is independent of the research we posted yesterday, which demonstrates the ability to reduce pre-training FLOPs while maintaining accuracy on downstream tasks.
This is the first work (that we know of!) to demonstrate the use of sparsity for improving the accuracy of models via a set of sparse transformations.
mouldygoldie t1_jdaa3nv wrote
I think I'd look for a different acronym to SIFT, given that's a very well known feature detector and descriptor in computer vision...