Exarctus
Exarctus t1_jcfmqqs wrote
Reply to comment by 1F9 in [N] PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever by [deleted]
I think you’ve entirely misunderstood what PyTorch is and how it functions.
PyTorch is a front-end to libtorch, which is the C++ backend. Libtorch itself is a wrapper to various highly optimised libraries as well as CUDA implementations of specific ops. Virtually nothing computationally expensive is done on the python layer.
Exarctus t1_j0ztwve wrote
Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev
I’ve not encountered many situations where I cannot use existing vectorized PyTorch indexing operations to do complicated masking or indexing etc, and I’ve written some pretty complex code bases during my PhD.
Alternatively you could write your code in C++/CUDA C however you like and provide PyTorch bindings to include it in your python workflow.
Exarctus t1_j0zqq3b wrote
Reply to comment by vprokopev in [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev
The vast majority of PyTorch functions calls are implemented in either Cuda C or OpenMP parallelized C++
python is only used as a front-end. Very little of the computational workload is done by the python interpreter.
Additionally The C++ API for PyTorch is very much in the same style as the python API. Obviously you have some additional flexibility in how you optimize your code but the tensor-based operations are the same.
PyTorch also makes it trivially easy to write optimized CUDA C code and provide python bindings to it so you can make use of it with faster development time in python, while retaining the computational benefits of C/C++/CUDA C for typical workloads.
Exarctus t1_iyd7r5i wrote
Reply to comment by Ronny_Jotten in Does anyone uses Intel Arc A770 GPU for machine learning? [D] by labloke11
I am an ML scientist. And the statement you're making about AMD GPUs only "being fine in limited circumstances" is absolutely false. Any network that you can create for a CUDA-enabled GPU can also be ported into an AMD-enabled GPU when working with PyTorch with a single code line change.
The issues arise when developers of particular external libraries that you might want to use only develop for one platform. This is **only** an issue when these developers make customized CUDA C implementations for specific part of their network, but don't use HIP for cross-compatibility. This is not an issue if the code is pure PyTorch.
This is not an issue with AMD, it's purely down to laziness (and possibly ill-experience) of the developer.
Regardless, whenever I work with AMD GPUs and implement or derive from other people work, it does sometimes include extra development time to convert e.g any customized CUDA C libraries that have been created by the developer to HIP libraries, but this in itself isn't too difficult as there are conversion tools available.
Exarctus t1_iyd2ety wrote
Reply to comment by Ronny_Jotten in Does anyone uses Intel Arc A770 GPU for machine learning? [D] by labloke11
My comment was aimed more towards ML scientists (the vast majority of whom are linux enthusiasts) who are developing their own architectures.
Translating CUDA to HIP is also not particularly challenging, as there are tools available which do this for you.
Exarctus t1_iycjag7 wrote
Reply to comment by kaskoosek in Does anyone uses Intel Arc A770 GPU for machine learning? [D] by labloke11
PyTorch has a ROCm distribution so most modernish AMD cards should be fine…
Exarctus t1_ix7avzz wrote
Reply to comment by leoholt in [R] Tips on training Transformers by parabellum630
You basically want to extensively test that the sequential elements in your input are being mapped to unique vectors.
Exarctus t1_itl63zr wrote
Reply to comment by benanderson89 in Apple testing Apple Silicon Mac Pro with 24-core CPU, 76-core GPU, 192GB of memory by prehistoric_knight
Hi. I work in simulation.
Absolutely nobody is spending 100K on a single (laptop) workstation. What a ridiculously made up number. You can buy a reasonably large GPU farm for that amount of cash investment and have several orders of magnitude more compute. The vast majority of simulation codes are designed to scale well with problem size on GPUs.
Exarctus t1_jdqjsr9 wrote
Reply to comment by [deleted] in Nvidia Speeds Key Chipmaking Computation by 40x by Vucea
This has nothing to do with this post…