Viewing a single comment thread. View all comments

RoaRene317 t1_jdfnzna wrote on March 24, 2023 at 1:21 AM

My suggestion is using 8 bit or 4 bit quantization. Also you can using automatic device mapping on Transformers that can offload partially to your CPU (warning : It use lots of System Memory [RAM]).