Submitted by poppear t3_11ozl85 in MachineLearning
I put together this plain pytorch implementation of LLaMA (i just substituted the fairscale layers with the native ones and converted the weights accordingly) that can be more easily run in different environments.
The big problem with the official implementation is that in order to run the 65B version you need 8 GPUs no matter what, and to run the 30B version you need 4 and so on. In reality you can easily fit the 65B version in 2 A100 with 100G of VRAM.
vanilla-llama solves this problem. You just need to have enough memory and the model will be load in all the available GPUs.
​
PuzzledWhereas991 t1_jbvcei5 wrote
Which model can I run with 2 3060ti (8gb) and 1 3080 ti (12gb)?