This is highly relevant for my work, I'm very excited about this!
(Ah n/m I assumed the submitter was one of the authors.)
I saw that you've uploaded activation scales (equation 4) for a number of models, but if calculating this for a new model, how do you use a calibration dataset when doing that? Do you take the maximum across all of the calibration values, or calculate the maximum for each value individually and then average? I see that
singularperturbation t1_ixat83k wrote
Reply to [R] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models - Massachusetts Institute of Technology and NVIDIA Guangxuan Xiao et al - Enables INT8 for LLM bigger than 100B parameters including OPT-175B, BLOOM-176B and GLM-130B. by Singularian2501
This is highly relevant for my work, I'm very excited about this!
(Ah n/m I assumed the submitter was one of the authors.)
I saw that you've uploaded activation scales (equation 4) for a number of models, but if calculating this for a new model, how do you use a calibration dataset when doing that? Do you take the maximum across all of the calibration values, or calculate the maximum for each value individually and then average? I see that> Code will be released at: https://github.com/mit-han-lab/smoothquant in ~2 weeks.
so I guess I may just need to be patient until this is released lol.