We have recently open sourced our inference acceleration library, voltaML.

⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.

We would love for the reddit and the open-source community to use it, give feedback and help us improve the library.

Comments

PlayOffQuinnCook t1_iu7ualp wrote on October 29, 2022 at 5:53 AM

#307,148

Congrats on open sourcing! Quick question on fusion - how do you guys fuse layers like conv, bn, relu, etc, if they are not named conv1, bn1, relu in the nn.Module.

PlayOffQuinnCook t1_iu7ug9w wrote on October 29, 2022 at 5:55 AM

#307,171

Replying to PlayOffQuinnCook (#307,148)

Oh I guess the library supports accelerating a couple of well known models defined in the models module?

_Arsenie_Boca_ t1_iu7yz56 wrote on October 29, 2022 at 6:58 AM

#307,842

Looks promising. A comparison with other competitors (hf accelerate, neuralmagic, nebullvm, ...) would be great

limpbizkit4prez t1_iu965oz wrote on October 29, 2022 at 3:09 PM

#314,825

Do you have any benchmarks against other frameworks and have you benchmarked other types of models or are you doing something specific for NLP?

LetterRip t1_iu9be41 wrote on October 29, 2022 at 3:47 PM

#315,852

Have you tried it with diffusers/stable diffusion?

pommedeterresautee t1_iu9vc6f wrote on October 29, 2022 at 6:08 PM

#319,661

Hi, I am one of the authors of transformer deploy. I have seen you have copied most of the files for the transformer part. That’s really cool, I really appreciate you kept the licenses, may I ask you to cite our work in the Readme ?

Moreover, if I may, why did you copied instead of just importing a dependency? You would get the maintenance for free :-)

harishprab OP t1_iucxpg1 wrote on October 30, 2022 at 11:37 AM

#339,886

Replying to pommedeterresautee (#319,661)

Hey. You’ve done amazing work with transformers deploy. We have actually mentioned you in the work. We just wanted to have voltaML to be one repo for all ML, CV and NLP needs.

harishprab OP t1_iucy0ne wrote on October 30, 2022 at 11:40 AM

#339,963

Replying to PlayOffQuinnCook (#307,148)

We use the TorchFX library to do this on CPU. And TensorRT is doing this for GPU. We’re not using any custom function for the fusing. TorchFX and TensorRT are doing it anyways

harishprab OP t1_iucy9k9 wrote on October 30, 2022 at 11:43 AM

#340,017

Replying to _Arsenie_Boca_ (#307,842)

Thanks. HF accelerate is basically doing it for intel chips. I haven’t seen them support TensorRT. I could be wrong. Neural Magic is mostly about quantisation aware training and pruning techniques. We focus on post training techniques. We should try Nebullvm. They’re a great library too.

harishprab OP t1_iucycq7 wrote on October 30, 2022 at 11:44 AM

#340,031

Replying to limpbizkit4prez (#314,825)

We have computer vision, NLP and decision trees inference acceleration.

harishprab OP t1_iucyp3o wrote on October 30, 2022 at 11:48 AM

#340,104

Replying to pommedeterresautee (#319,661)

Maybe we’ll have it as a dependency. We’re also planning to do some of our own work on NLP. So thought we’ll have it as a non dependency

limpbizkit4prez t1_iud9604 wrote on October 30, 2022 at 1:31 PM

#342,526

Replying to harishprab (#340,031)

Oh wow, I have no idea how I missed the other parts of the readme that shows other types of applications. Do you plan on showing any benchmarks against other frameworks?

PlayOffQuinnCook t1_iueq6l4 wrote on October 30, 2022 at 7:39 PM

#353,863

Replying to harishprab (#339,963)

I understand that. But let’s say I have these operators named as c1, b1, r1 instead of what it expects, the fusion logic won’t work. So my question was if this library works only a fixed set of models defined in the library itself or it can work against any models users write.

harishprab OP t1_iugkqy9 wrote on October 31, 2022 at 3:55 AM

#370,166

Right now it supports only for the models that are supported by these libraries. We have tried fusion manually earlier but ran into many issues given the diversity of models. So we stuck to torchfx and trt. Maybe in the future we can make it modular so that it can work on any model.