Haha that's amazing to hear because I had a similar experience too! The data scientist on the team was re-initializing the model, the image matrices, and other objects in the data pipeline over and over again for every inference. I had to decouple those in a similar way as you did.
Everyone usually thinks the model is the problem but I am starting to think that the rest of the code is usually what actually needs to be optimized
Oh wow I never heard about Deepstream in my earlier project. Have you used it? Were you able to bring it on mid-project or did you have to use it from the start of the project?
Cool, how did you learn all those techniques? And how did you determine which one was the major cause of too much memory usage / too slow inference time?
Interesting, hadn't heard of TVM before! I'm wondering, did you come across cases in your work where it wasn't the model that was the worst bottleneck but the pre-processing / data pipeline that actually needed to be optimized? I had one experience like that so just wondering how common it is
muunbo OP t1_is2atui wrote
Reply to comment by Alors_HS in Optimizing deep learning inference to run on the edge? by muunbo
Haha that's amazing to hear because I had a similar experience too! The data scientist on the team was re-initializing the model, the image matrices, and other objects in the data pipeline over and over again for every inference. I had to decouple those in a similar way as you did.
Everyone usually thinks the model is the problem but I am starting to think that the rest of the code is usually what actually needs to be optimized