Submitted by Dazzling_Koala6834 t3_zkn9jb in MachineLearning
zenpianist t1_j01jvp7 wrote
I think real world ML Dev and OPS is much messier. Something as simple as a decoupled Inference pipeline would mean a lot to us, instead of having to retrigger the workflow when something failed. At TB scale, even snapshotting outputs from each stage became ridiculously expensive and downright impossible. Would love to see how you address those
Viewing a single comment thread. View all comments