BellyDancerUrgot
BellyDancerUrgot t1_jdx6w01 wrote
Reply to comment by ChingChong--PingPong in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
The implication was, most of accessible textual data. Which is true. The exaggeration was such cuz it’s a language model first and foremost and previous iterations like gpt3 and 3.5 were not multimodal. Also , as far as accounts go, that’s a huge ‘?’ atm. Especially going by tweets like these
https://twitter.com/katecrawford/status/1638524011876433921?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q
The reality is , we and you don’t have the slightest clue regarding what it was trained on and msft has sufficient compute to train on all of the text data on the internet.
When it comes to multimodal media we don’t really need to train a model on the same amount of data required for text.
BellyDancerUrgot t1_jdtci38 wrote
Reply to comment by StrippedSilicon in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Well let me ask you, how does it fail simple problems if it can solve more complex ones? If you solve these problems analytically then it stands to reason that you wouldn’t be making an error , ever, for a simple question as that.
BellyDancerUrgot t1_jds7yao wrote
Reply to comment by suflaj in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Empty vessels make much noise seems to be a quote u live by. I’ll let the readers of this thread determine who between us has contributed to the discussion and who writes extensively verbose commentary , ironically , with 0 content.
BellyDancerUrgot t1_jds7iva wrote
Reply to comment by StrippedSilicon in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
The reason I say it’s a recontextualization and lacks deeper understanding is because it doesn’t hallucinate sometimes , it hallucinates all the time, sometimes the hallucinations align with reality that’s all. Take this thread for eg:
-
https://twitter.com/ylecun/status/1639685628722806786?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q
-
https://twitter.com/stanislavfort/status/1639731204307005443?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q
-
https://twitter.com/phillipharr1s/status/1640029380670881793?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q
A system that fully understood the underlying structure of the question would not give you varying answers with the same prompt.
Inconclusive is the third likeliest answer. Despite having a big bias toward the correct answer (keywords like dubious for eg) it still makes mistakes to a rather simple question. Sometimes it does get it right with the bias sometimes even without the bias.
Language imo lacks causality for intelligence since it’s a mere byproduct of intelligence. Which is why these models imo hallucinate all the time, and sometimes the hallucinations line up with reality and sometimes they don’t. The likelihood of the prior is just increased because of the huge train size.
BellyDancerUrgot t1_jdpbtyo wrote
Reply to comment by Appropriate_Ant_4629 in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Oh I’m sure it had the data. I tested them on a few different things , OOPs, some basic CNN math, some philosophy, some literature reviews, some paper summarization. The last two were really bad. One mistake in CNN math. One mistake in OOPs. Creative things like writing essays or solving technical troubleshooting problems, even niche stuff like how I could shunt a gpu , it managed to answer correctly.
I think people have the idea that I think gpt is shit. On the contrary I think it’s amazing. Just not the holy angel and elixir of life that AI influencers peddle it as.
BellyDancerUrgot t1_jdpb9pi wrote
Reply to comment by nixed9 in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
I agree that Bing chat is not nearly as good as chatgpt4 and I already know everyone is going to cite that paper as a counter to my argument but that paper isn’t reproducible, idek if it’s peer reviewed, it’s lacking a lot of details and has a lot of conjecture. It’s bad literature. Hence even tho the claims are hype, I take it with a bucket full of salt. A lot of scientists I follow in this field have mentioned that even tho the progress is noticeable in terms of managing misinformation, it’s just an incremental improvement and nothing truly groundbreaking.
Not saying OpenAI is 100% lying. But this thread https://twitter.com/katecrawford/status/1638524011876433921?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q by Kate Crawford (msft research ) is a good example of what researchers actually think of claims like these and some of its dangers.
Until I use it for myself personally I won’t know and will have to rely on what I’ve heard from other phds and masters or PostDocs or professors. Personally, The only thing I can compare to is chatgpt and bing chat and both have been far less than stellar in my experience.
BellyDancerUrgot t1_jdpa0mz wrote
Reply to comment by StrippedSilicon in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Tbf I think I went a bit too far when I said it has everything memorized. But it also has access to an internet worth of contextual information on basically everything that has ever existed. So even though it’s wrong to say it’s 100% memorized, it’s still just intelligently regurgitating information it has learnt with new context. Being able to re-contextualize information isn’t a small feat mind u. I think gpt is amazing just like I found the original diffusion paper and wgans to be. It’s Just really overhyped to be something it isn’t and fails quite spectacularly on logical and factual queries. Cites things that don’t exist, makes simple mistakes but solves more complex ones. Tell tale sign of the model lacking a fundamental understanding of the subject.
BellyDancerUrgot t1_jdp945d wrote
Reply to comment by suflaj in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Claim, since you managed to get lost in your own comment:
Gpt hallucinates a lot and is unreliable for any factual work. It’s useful for creative work when the authenticity of its output doesn’t have to be checked.
Your wall of text can be summarized as, “I’m gonna debate you by suggesting no one knows the definition of AGI.” The living embodiment of the saying “empty vessels make much noise. No one knows what the definition of intuition is but what we know is that memory does not play a part in it. Understanding causality does.
It’s actually hilarious that you bring up source citation as some form of trump card after I mention how everything you know about GPT4 is something someone has told you to believe in without any real discernible and reproducible evidence.
Instead of maybe asking me to spoon feed you spend a whole of 20 secs googling.
https://twitter.com/random_walker/status/1638525616424099841?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q
https://twitter.com/chhillee/status/1635790330854526981?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q
https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
https://aiguide.substack.com/p/did-chatgpt-really-pass-graduate
“I don’t quite get it how works” + “it surprises me” ≠ it could maybe be sentient if I squint.
Thank you for taking the time to write two paragraphs pointing out my error in using the phrase “aces leetcode” after I acknowledged and corrected the mistake myself, maybe you have some word quota you were trying to fulfill with that . Inference time being dependent on length of output sequence has been a constant since the first attention paper let alone the first transformer paper. My point is, it’s good at solving leetcode when it’s present in the training set.
Ps- also kindly refrain from passing remarks on my understanding of the subject when the only arguments you can make are refuting others without intellectual dissent. It’s quite easy to say, “no I don’t believe u prove it” while also not being able to distinguish between Q K and V if it hit u on the face.
BellyDancerUrgot t1_jdns4yg wrote
Reply to comment by suflaj in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
-
Paper summarization and factual analysis of 3d generative models, basic math, basic oops understanding were the broad topics I experimented it on. Not giving u the exact prompts but you are free to evaluate it yourselves.
-
Wrong choice of words on my part. When I said ‘ace’ I implied that It does really good on leetcode questions from before 2021 and it’s abysmal after. Also the ones it does solve it solves at a really fast rate. From a test that happened a few weeks ago it solved 3 questions pretty much instantly and that itself would have placed it in the top 10% of competitors.
-
Unbiased implies being tested on truly unseen data which there is far less off considering the size of the train data used. Many of the examples cited in their new paper “sparks of agi” are not even reproducible.
https://twitter.com/katecrawford/status/1638524011876433921?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q
-
Insufficient because as I said , no world model, no intuition, only memory. Which is why it hallucinates.
-
Intuition is understanding the structure of the world without having to have the entire internet to memorize it. A good analogy would be of how a child isnt taught how gravity works when they first start walking. Or how you can not have knowledge about a subject and still infer based on your understanding of underlying concepts.
These are things u can inherently not test or quantify when evaluating models like gpt that have been trained on everything and you still don’t know what it has been trained on lol.
-
You can keep daring me and idc because I have these debates with fellow researchers in the field, always looking for a good debate if I have time. I’m not even an NLP researcher and even then I know the existential dread creeping in on NLP researchers because of how esoteric these results are and how AI influencers have blown things out of proportion citing cherry picked results that aren’t even reproducible because you don’t know how to reproduce them.
-
There is no real way an unbiased scientist reads openAIs new paper on sparks of AGI and goes , “oh look gpt4 is solving AGI”.
-
Going back on what I said earlier, yes there is always the possibility that I’m wrong and GPT is indeed the stepping stone to AGI but we don’t know because the only results u have access to are not very convincing. And on a user level it has failed to impress me beyond being a really good chatbot which can do some creative work.
BellyDancerUrgot t1_jdno8w6 wrote
Reply to comment by StrippedSilicon in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
That paper is laughable and a meme. My twitter feed has been spammed by people tweeting about this paper and as someone in academia it’s sad to see the quality for research publications to be this low. I can’t believe I’m saying this as a student of Deep Learning but Gary Marcus on his latest blogpost is actually right.
BellyDancerUrgot t1_jdnnuii wrote
Reply to comment by nixed9 in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Iirc bing chat uses gpt4 ?
BellyDancerUrgot t1_jdlfsuq wrote
Reply to comment by FirstOrderCat in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Except now it’s closedAI and most of the papers they release are laughably esoteric. I know the world will catch up within months to whatever they pioneer but it’s just funny seeing this happen after they held a sanctimonious attitude for so long.
BellyDancerUrgot t1_jdldmda wrote
Reply to comment by FirstOrderCat in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
Funny cuz , I keep seeing people rave like madmen over gpt4 and chatgpt and I’ve had a 50-50 hit rage wrt good results or hallucinated bullshit with both of them. Like it isn’t even funny. People think it’s going to replace programmers and doctors meanwhile it can’t do basic shit like cite the correct paper.
Of course it aces tests and leetcode problems it was trained on. It was trained on basically the entire internet. How do you even get an unbiased estimate of test error?
Doesn’t mean it isn’t impressive. It’s just one huge block of really good associative memory. Doesn’t even begin to approach the footholds of AGI imo. No world model. No intuition. Just memory.
BellyDancerUrgot t1_jdld7w4 wrote
LLMs are not the route to AGI as they exist today.
And no I don’t think fine tuning itself is enough to choose some other model over the top LLMs. For most LLM focussed work API calls to the big ones will be enough.
BellyDancerUrgot t1_jbrn733 wrote
Reply to comment by ScottHameed in Desktop Computer or some other way to train neural networks? by ScottHameed
Personally haven’t faced an issue but a more experienced person told me he has ran into some issues with AMD chips with some frameworks. That’s all. Nothing major.
BellyDancerUrgot t1_jbrlatc wrote
Reply to comment by ScottHameed in Desktop Computer or some other way to train neural networks? by ScottHameed
I use a 5800x but for deep learning I would recommend Intel for max stability
BellyDancerUrgot t1_jbrl8kw wrote
Reply to comment by ScottHameed in Desktop Computer or some other way to train neural networks? by ScottHameed
Facebook marketplace , inspect the product and bargain. Got myself a brand new 3090 with the peel on it for 750$CAD
BellyDancerUrgot t1_jbrehr8 wrote
Get a 3090 or a 4090 for the vram. For short term cloud is always better cuz it’s cheaper. For long term , having a gpu is incredibly handy and the cloud expenses build up so a gpu in that case becomes more economical.
Imo 3080 isn’t worth it because 10gb is too low.
BellyDancerUrgot t1_j8gmune wrote
Reply to MacBook Air vs Pro by Fun-Cartographer8611
PyTorch MPS is buggy. Even with the stable build. Something with cuda is far better imo. Personally I use a mbp 14’ with the M1 Pro base model for literally everything and then I have a desktop (had one cuz I play games, just upgraded the gpu to a cheap 3090 I found online, works like a charm for 99% of work loads when it comes to training something.
For the 1% when I do not have enough compute I use my universities cluster or compute Canada for distributed training.
BellyDancerUrgot t1_j8e6rq6 wrote
Top reply presents it well. Also, I think Jeff Heaton might make a video on the rtx6000 since he just posted an unboxing recently. Might want to check that out incase he talks about it in details.
BellyDancerUrgot t1_j8111wa wrote
Reply to My Machine Learning: Learning Program by Learning_DL
It would depend person to person but imo considering u have it planned out u are likelier to succeed than not if u stick with what u came up with.
BellyDancerUrgot t1_j7f0u7u wrote
Reply to comment by beautyofdeduction in Why does my Transformer blow GPU memory? by beautyofdeduction
Okay yeah Idk wtf I was typing. Yes 0.176gb for just the parameters. U still have to account for dense representations of long sequences, that too 8 times, activations, gradients and all these multiplied by the number of layers. There was a formula to approximate the value I read somewhere online. Activations I think take up way more memory than the model itself.
The memory requirement is roughly inline with most mid size transformer models I think.
BellyDancerUrgot t1_j7eq93o wrote
Reply to comment by beautyofdeduction in Why does my Transformer blow GPU memory? by beautyofdeduction
Each Float64 is 4 bytes. U said u have 22M parameters.
Also besides ur params and activations u still have gradients + sequences are mapped for each attention head so multiply that by 8 as well.
For context I think deeplabv3 which iirc is a model with 58mil parameters was trained on 8 V100s.
Edit : I clearly had a brain stroke while writing the first part so ignore
BellyDancerUrgot t1_j7ec7oj wrote
~ 83gb I think, not 500mb
BellyDancerUrgot t1_jedgb9t wrote
Reply to Any advanced and updated DL courses? by nuquichoco
What I usually do is read the most noteworthy research papers and then check the implementation on GitHub. I’m taking Aaron Courvilles classes and they are good but without going through a degree the best choice would probably be Karpathy (plus all the links shared by u/verboseEqualsTrue