Alors_HS t1_is1871l wrote on October 12, 2022 at 3:47 PM

Reply to comment by muunbo in Optimizing deep learning inference to run on the edge? by muunbo

Well, I needed to solve my problem so I looked at papers / software solutions. Then it was a slow process of many iterations of trials and errors.

I couldn't tell you what would be the best for your use case tho. I am afraid it's been too long for me to remember the details. Beside, each method may be more or less effective according to good results on the metrics/inference time or the method and means of training that you can afford.

I can give you a tip : I initialized my inference script only once per boot, and then put it in "waiting mode" so I wouldn't have to initialize the model for each inference (it's the largest cause of losing time). Then upon receiving a socket message, the script would read a data file, do an inference pass, write the results in an another file, delete/move the data to storage and wait for the next socket message. It's obvious when you think about it that you absolutely don't want to call/initialize your inference script once per inference, but, well, you never know what people think about :p

muunbo OP t1_is2atui wrote on October 12, 2022 at 7:58 PM

Haha that's amazing to hear because I had a similar experience too! The data scientist on the team was re-initializing the model, the image matrices, and other objects in the data pipeline over and over again for every inference. I had to decouple those in a similar way as you did.

Everyone usually thinks the model is the problem but I am starting to think that the rest of the code is usually what actually needs to be optimized