ReginaldIII t1_j33ff9r wrote on January 5, 2023 at 8:02 PM

Reply to comment by Nhabls in [News] AMD Instinct MI300 APU for AI and HPC announced by samobon

Except there is an ecosystem monopoly at the cluster level too because some of the most established, scalable, and reliable software (like those used in fields like bio-informatics as an example) only provide CUDA implementations of key algorithms and being able to accurately reproduce results computed by them is vital.

This essentially limits those software to only running on large CUDA clusters. You can't reproduce the results without the scale of a cluster.

Consider software for processing Cryo-Electron Microscopy and Ptychography data. Very very few people are actually "developing" those software packages, but thousands of researchers around the world are using them at scale to process their micrographs. Those microscopists are not programmers, or really even cluster experts, and they just don't have the skillsets to develop on these code bases. They just need it work reliably and reproducibly.

I've been working in HPC on a range of large scale clusters for a long time. There has been a massive and dramatic demographic shift in terms of the skillsets that our cluster users have. A decade ago you wouldn't dream of letting someone not a HPC expert anywhere near your cluster. If a team of non-HPC people needed HPC you'd hire HPC experts into your team to handle that for you and tune the workloads onto the cluster and develop the code to make it work best. Now we have an environment where non-HPC people can pay for access and run their workloads directly because they leverage these pre-tinned software packages.