boadie t1_j8hhtwi wrote
In the opposite direction from your question is a very interesting project, TinyNN all implemented as close to the metal as possible and very fast: https://github.com/NVlabs/tiny-cuda-nn
Also in the vague neighbourhood of your question is the Triton compiler, while on the surface being a Python jit compiler is language coverage is much smaller than Python and you can view it as a small dsl, all the interesting bits are way below that level: https://openai.com/blog/triton/
Viewing a single comment thread. View all comments