harishprab
harishprab OP t1_iucyp3o wrote
Reply to comment by pommedeterresautee in [R] Open source inference acceleration library - voltaML by harishprab
Maybe we’ll have it as a dependency. We’re also planning to do some of our own work on NLP. So thought we’ll have it as a non dependency
harishprab OP t1_iucycq7 wrote
Reply to comment by limpbizkit4prez in [R] Open source inference acceleration library - voltaML by harishprab
We have computer vision, NLP and decision trees inference acceleration.
harishprab OP t1_iucy9k9 wrote
Reply to comment by _Arsenie_Boca_ in [R] Open source inference acceleration library - voltaML by harishprab
Thanks. HF accelerate is basically doing it for intel chips. I haven’t seen them support TensorRT. I could be wrong. Neural Magic is mostly about quantisation aware training and pruning techniques. We focus on post training techniques. We should try Nebullvm. They’re a great library too.
harishprab OP t1_iucy0ne wrote
Reply to comment by PlayOffQuinnCook in [R] Open source inference acceleration library - voltaML by harishprab
We use the TorchFX library to do this on CPU. And TensorRT is doing this for GPU. We’re not using any custom function for the fusing. TorchFX and TensorRT are doing it anyways
harishprab OP t1_iucxpg1 wrote
Reply to comment by pommedeterresautee in [R] Open source inference acceleration library - voltaML by harishprab
Hey. You’ve done amazing work with transformers deploy. We have actually mentioned you in the work. We just wanted to have voltaML to be one repo for all ML, CV and NLP needs.
harishprab OP t1_iugkqy9 wrote
Reply to [R] Open source inference acceleration library - voltaML by harishprab
Right now it supports only for the models that are supported by these libraries. We have tried fusion manually earlier but ran into many issues given the diversity of models. So we stuck to torchfx and trt. Maybe in the future we can make it modular so that it can work on any model.