Viewing a single comment thread. View all comments

remghoost7 t1_jbzro03 wrote on March 12, 2023 at 11:55 PM

Reply to comment by The_frozen_one in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Nice!

How's the generation speed...?

The_frozen_one t1_jbzv0gt wrote on March 13, 2023 at 12:21 AM

It takes about 7 seconds to generate a full response using 13B to a prompt with the default (128) number of predicted tokens.