[R] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models - Massachusetts Institute of Technology and NVIDIA Guangxuan Xiao et al - Enables INT8 for LLM bigger than 100B parameters including OPT-175B, BLOOM-176B and GLM-130B. Submitted by Singularian2501 t3_z1b2rp on November 21, 2022 at 9:37 PM in MachineLearning 13 comments 55
Acceptable-Cress-374 t1_ixbzdfe wrote on November 22, 2022 at 8:19 AM Would this mean that it could become feasible to run gpt-neox inference on a 3090/4090 w/ 24 GB VRAM? That would be huge! Permalink 8
Viewing a single comment thread. View all comments