yahma t1_jdiqmj2 wrote on March 24, 2023 at 6:00 PM

This may be the size of the datasets, but i it's hard to say how many parameters will be needed for a good llm that's just really good at explaining code.

yahma t1_j2ssc01 wrote on January 3, 2023 at 6:31 PM

Reply to comment by C0hentheBarbarian in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

I wasn't very impressed with BLOOMZ. Responses seem short and optimized for Q/A style output. Perhaps Zero-Shot and single-shot worked better than Bloom, but Bloom seemed to produce better output for stories or writing in general.

I was only able to test the 6B models though, so not sure how the 176B models compare.

yahma t1_j2ss1ox wrote on January 3, 2023 at 6:29 PM

Reply to [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

So with pruning and 8-bit quantization, are we able to run BLOOM-176B on a single GPU yet?

yahma t1_j1dulgw wrote on December 23, 2022 at 4:02 PM

Reply to [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

Based on my testing, none of the open source models are anywhere near as good as ChatGPT (or even davinci-03 .. the lastest GPT-3 snapshot).

I think open source models need more fine-tuning and some RL techniques applied to get anywhere close.