[deleted] t1_jacx9ai wrote
Reply to comment by abnormal_human in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
That’s about x100 less than what I’d expected.
Beli_Mawrr t1_jad4r9n wrote
That's almost in the realm of my computer can run it, no?
curiousshortguy t1_jad9s4t wrote
it is, you can probably do 2 to 8 billion on your average gaming pc, and 16 on a high end one
AnOnlineHandle t1_jaeshwf wrote
Is there a way to convert parameter count into vram requirements? Presuming that's the main bottleneck?
metal079 t1_jaeuymi wrote
Rule of thumb is vram needed = 2x per billion parameters, though I recall pygamillion which is 6B says it needs 16GB of ram so it depends.
curiousshortguy t1_jaf3aab wrote
Yeah, about 2-3. You can easily shove layers of the networks on disk, and then load even larger models that don't fit in vram BUT disk i/o will make inference painfully slow.
new_name_who_dis_ t1_jaf4lmy wrote
Each float32 is 4 bytes.
[deleted] t1_jaeu7ev wrote
[removed]
abnormal_human t1_jad6qae wrote
Yeah, probably.
dancingnightly t1_jadj7fa wrote
Edit: Seems like for this one yes. They do consider human instructions (similarish to the goal of a RLHF which requires more RAM), by adding them directly in the text dataset, as mentioned in 3.3 Language-Only Instruction Tuning-
For other models, like OpenAssistant coming up, one thing to note is that, although the generative model itself may be runnable locally, the reward model (the bit that "adds finishing touches" and ensures following instructions) can be much bigger. Even if the GPT-J underlying model is 11GB on RAM and 6B params, the RLHF could seriously increase that.
This models is in the realm of the smaller T5, BART and GPT-2 models released 3 years ago and runnable then on decent gaming GPUs
currentscurrents t1_jaetyg1 wrote
Can't the reward model be discarded at inference time? I thought it was only used for fine-tuning.
[deleted] t1_jaejynm wrote
[removed]
currentscurrents t1_jaetvbb wrote
Definitely in the realm of running on your computer. Almost in the realm of running on high-end smartphones with TPUs.
[deleted] t1_jadkcqd wrote
[deleted]
Viewing a single comment thread. View all comments