Viewing a single comment thread. View all comments

Acceptable-Cress-374 t1_j03yizm wrote

Could you use this to run inference on gpt-neoX using 2-4 computers w/ 3090s? IIRC it requires ~40gb VRAM at inference, and multiple of that for finetuning...

2

hx-zero OP t1_j03zy85 wrote

Yes, it's technically possible to integrate GPT-NeoX in our code instead of BLOOM (requires some work, but it's not too hard).

Also, it may be possible to fit GPT-NeoX into 20 GB of VRAM (i.e., one 3090) using recent LLM.int8() work: https://huggingface.co/blog/hf-bitsandbytes-integration We use this approach to make BLOOM consume as few memory as possible in Petals.

6