Submitted by Open-Dragonfly6825 t3_10s3u1s in deeplearning
BellyDancerUrgot t1_j6zyiqm wrote
I’ll be honest, I don’t really know what FPGAs (I reckon they are an ASIC for matrix operations?) do and how they do it but tensor cores already provide optimization for matrix / tensor operations and fp16 and mixed precision has been available for quite a few years now. Ada and hopper even enable insane performance improvements for fp8 operations. Is there any real verifiable benchmark that compares training and inference time of the two?
On top of that there’s the obvious Cuda monopoly that nvidia has a tight leash on. Without software even the best hardware is useless and almost everything is optimized to run on Cuda backend.
Open-Dragonfly6825 OP t1_j72pzlc wrote
FPGAs are reconfigurable hardware accelerators. That is, you could theoretically "syntehthize" (implement) any digital circuit into an FPGA, given that the FPGA has a high enough amount of "resources".
This would let the user to deploy custom hardware solutions to virtually any application, which could be way more optimized than software solutions (including using GPUs).
You could implement tensor cores or a TPU using an FPGA. But, obviously, an ASIC is faster and more energy efficient than its equivalent FPGA implementation.
Linking to what you say, besides all the "this is just theory, in practice things are different" of FPGAs, programming GPUs with CUDA is way way easier than programming FPGAs as of today.
Viewing a single comment thread. View all comments