Submitted by harishprab t3_yga0s1 in MachineLearning

We have recently open sourced our inference acceleration library, voltaML.

⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM.

We would love for the reddit and the open-source community to use it, give feedback and help us improve the library.

https://github.com/VoltaML/voltaML

16

Comments

You must log in or register to comment.

pommedeterresautee t1_iu9vc6f wrote

Hi, I am one of the authors of transformer deploy. I have seen you have copied most of the files for the transformer part. That’s really cool, I really appreciate you kept the licenses, may I ask you to cite our work in the Readme ?

Moreover, if I may, why did you copied instead of just importing a dependency? You would get the maintenance for free :-)

10

harishprab OP t1_iucxpg1 wrote

Hey. You’ve done amazing work with transformers deploy. We have actually mentioned you in the work. We just wanted to have voltaML to be one repo for all ML, CV and NLP needs.

1

harishprab OP t1_iucyp3o wrote

Maybe we’ll have it as a dependency. We’re also planning to do some of our own work on NLP. So thought we’ll have it as a non dependency

1

LetterRip t1_iu9be41 wrote

Have you tried it with diffusers/stable diffusion?

3

PlayOffQuinnCook t1_iu7ualp wrote

Congrats on open sourcing! Quick question on fusion - how do you guys fuse layers like conv, bn, relu, etc, if they are not named conv1, bn1, relu in the nn.Module.

1

PlayOffQuinnCook t1_iu7ug9w wrote

Oh I guess the library supports accelerating a couple of well known models defined in the models module?

2

harishprab OP t1_iucy0ne wrote

We use the TorchFX library to do this on CPU. And TensorRT is doing this for GPU. We’re not using any custom function for the fusing. TorchFX and TensorRT are doing it anyways

0

PlayOffQuinnCook t1_iueq6l4 wrote

I understand that. But let’s say I have these operators named as c1, b1, r1 instead of what it expects, the fusion logic won’t work. So my question was if this library works only a fixed set of models defined in the library itself or it can work against any models users write.

1

_Arsenie_Boca_ t1_iu7yz56 wrote

Looks promising. A comparison with other competitors (hf accelerate, neuralmagic, nebullvm, ...) would be great

1

harishprab OP t1_iucy9k9 wrote

Thanks. HF accelerate is basically doing it for intel chips. I haven’t seen them support TensorRT. I could be wrong. Neural Magic is mostly about quantisation aware training and pruning techniques. We focus on post training techniques. We should try Nebullvm. They’re a great library too.

1

limpbizkit4prez t1_iu965oz wrote

Do you have any benchmarks against other frameworks and have you benchmarked other types of models or are you doing something specific for NLP?

1

harishprab OP t1_iucycq7 wrote

We have computer vision, NLP and decision trees inference acceleration.

0

limpbizkit4prez t1_iud9604 wrote

Oh wow, I have no idea how I missed the other parts of the readme that shows other types of applications. Do you plan on showing any benchmarks against other frameworks?

1

harishprab OP t1_iugkqy9 wrote

Right now it supports only for the models that are supported by these libraries. We have tried fusion manually earlier but ran into many issues given the diversity of models. So we stuck to torchfx and trt. Maybe in the future we can make it modular so that it can work on any model.

1