GamerMinion
GamerMinion t1_jddeprr wrote
Reply to [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
When you say "FLOP-equivalent, does that also mean compute-time equivalent?
I ask this because on GPUs, models like EfficientNet, which technically have far less flops and parameters can be way slower than a standard ResNet of same accuracy because they're that much less efficiently parallelizable.
Did you look into inference latency on GPUs in your paper?
GamerMinion t1_jddlqit wrote
Reply to comment by brownmamba94 in [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101
Yes, theory is one thing, but you can't build ASICs for everything due to the cost involved.
Did you look into sparsity at latency-equivalent scales? i.e. same latency, bigger but sparser model.
I would be very interested to see results like that, especially for GPU-like accelerators (e.g. Nvidia's AGX computers use their ampere GPU architecture), as latency is a primary focus in high-value computer vision applications such as in autonomous driving.