harharveryfunny t1_ir7b7aw wrote on October 5, 2022 at 9:17 PM

If model load time is the limiting factor then ONNX runtime speed may be irrelevant. You may need to load the model once and reuse it, rather than loading each time.

There's a new runtime (TensorRT competitor) called TemplateAI available from Facebook, that does support CPU and is meant to be very fast, but I don't believe they yet support ONNX, and anyways you're not going to get a 50x speed-up just by switching to a faster runtime on same hardware,

Another alternative might be to run it in the cloud rather than locally.