Viewing a single comment thread. View all comments

atlast_a_redditor t1_jc3jzcf wrote

I know nothing about these stuff, but I'll rather want the 4-bit 13B model for my 3060 12GB. As I've read somewhere quantisation has less effect on larger models.

17

disgruntled_pie t1_jc4ffo1 wrote

I’ve successfully run the 13B parameter version of Llama on my 2080TI (11GB of VRAM) in 4-bit mode and performance was pretty good.

20

pilibitti t1_jc56vv5 wrote

hey do you have a link for how one might set this up?

6

disgruntled_pie t1_jc5g6or wrote

I’m using this project: https://github.com/oobabooga/text-generation-webui

The project’s Github wiki has a page on llama that explains everything you need.

23

pdaddyo t1_jc5uoly wrote

And if you get stuck check out /r/oobabooga

6