Haghiri75 OP t1_jcxt80d wrote on March 20, 2023 at 11:49 AM

I guess I found the reason. Dalai system does quantization on the models and it makes them incredibly fast, but the cost of this quantization is less coherency.

Jaffa6 t1_jd03par wrote on March 20, 2023 at 9:28 PM

That's odd.

Quantisation should make it go from (e.g.) 32 bit floats to 16bit floats, but I wouldn't expect it to lose that much coherency at all. Did they say somewhere that that's why?

Haghiri75 OP t1_jd32s29 wrote on March 21, 2023 at 2:18 PM

Apparently I was wrong, the problem is not only quantization. It is because it's not Stanford's Alpaca and another alpaca-like model. This was what I can surely say about that.

[deleted] t1_jd0wy1z wrote on March 21, 2023 at 12:53 AM

[deleted]