Submitted by muunbo t3_y1pui4 in deeplearning
Alors_HS t1_is1871l wrote
Reply to comment by muunbo in Optimizing deep learning inference to run on the edge? by muunbo
Well, I needed to solve my problem so I looked at papers / software solutions. Then it was a slow process of many iterations of trials and errors.
I couldn't tell you what would be the best for your use case tho. I am afraid it's been too long for me to remember the details. Beside, each method may be more or less effective according to good results on the metrics/inference time or the method and means of training that you can afford.
I can give you a tip : I initialized my inference script only once per boot, and then put it in "waiting mode" so I wouldn't have to initialize the model for each inference (it's the largest cause of losing time). Then upon receiving a socket message, the script would read a data file, do an inference pass, write the results in an another file, delete/move the data to storage and wait for the next socket message. It's obvious when you think about it that you absolutely don't want to call/initialize your inference script once per inference, but, well, you never know what people think about :p
muunbo OP t1_is2atui wrote
Haha that's amazing to hear because I had a similar experience too! The data scientist on the team was re-initializing the model, the image matrices, and other objects in the data pipeline over and over again for every inference. I had to decouple those in a similar way as you did.
Everyone usually thinks the model is the problem but I am starting to think that the rest of the code is usually what actually needs to be optimized
Viewing a single comment thread. View all comments