BellyDancerUrgot

BellyDancerUrgot t1_jdx6w01 wrote

The implication was, most of accessible textual data. Which is true. The exaggeration was such cuz it’s a language model first and foremost and previous iterations like gpt3 and 3.5 were not multimodal. Also , as far as accounts go, that’s a huge ‘?’ atm. Especially going by tweets like these

https://twitter.com/katecrawford/status/1638524011876433921?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q

The reality is , we and you don’t have the slightest clue regarding what it was trained on and msft has sufficient compute to train on all of the text data on the internet.

When it comes to multimodal media we don’t really need to train a model on the same amount of data required for text.

1

BellyDancerUrgot t1_jds7iva wrote

The reason I say it’s a recontextualization and lacks deeper understanding is because it doesn’t hallucinate sometimes , it hallucinates all the time, sometimes the hallucinations align with reality that’s all. Take this thread for eg:

  1. https://twitter.com/ylecun/status/1639685628722806786?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q

  2. https://twitter.com/stanislavfort/status/1639731204307005443?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q

  3. https://twitter.com/phillipharr1s/status/1640029380670881793?s=48&t=kwpwSgfnJvGe6J-1CEe_5Q

A system that fully understood the underlying structure of the question would not give you varying answers with the same prompt.

Inconclusive is the third likeliest answer. Despite having a big bias toward the correct answer (keywords like dubious for eg) it still makes mistakes to a rather simple question. Sometimes it does get it right with the bias sometimes even without the bias.

Language imo lacks causality for intelligence since it’s a mere byproduct of intelligence. Which is why these models imo hallucinate all the time, and sometimes the hallucinations line up with reality and sometimes they don’t. The likelihood of the prior is just increased because of the huge train size.

1

BellyDancerUrgot t1_jdpbtyo wrote

Oh I’m sure it had the data. I tested them on a few different things , OOPs, some basic CNN math, some philosophy, some literature reviews, some paper summarization. The last two were really bad. One mistake in CNN math. One mistake in OOPs. Creative things like writing essays or solving technical troubleshooting problems, even niche stuff like how I could shunt a gpu , it managed to answer correctly.

I think people have the idea that I think gpt is shit. On the contrary I think it’s amazing. Just not the holy angel and elixir of life that AI influencers peddle it as.

1

BellyDancerUrgot t1_jdpb9pi wrote

I agree that Bing chat is not nearly as good as chatgpt4 and I already know everyone is going to cite that paper as a counter to my argument but that paper isn’t reproducible, idek if it’s peer reviewed, it’s lacking a lot of details and has a lot of conjecture. It’s bad literature. Hence even tho the claims are hype, I take it with a bucket full of salt. A lot of scientists I follow in this field have mentioned that even tho the progress is noticeable in terms of managing misinformation, it’s just an incremental improvement and nothing truly groundbreaking.

Not saying OpenAI is 100% lying. But this thread https://twitter.com/katecrawford/status/1638524011876433921?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q by Kate Crawford (msft research ) is a good example of what researchers actually think of claims like these and some of its dangers.

Until I use it for myself personally I won’t know and will have to rely on what I’ve heard from other phds and masters or PostDocs or professors. Personally, The only thing I can compare to is chatgpt and bing chat and both have been far less than stellar in my experience.

1

BellyDancerUrgot t1_jdpa0mz wrote

Tbf I think I went a bit too far when I said it has everything memorized. But it also has access to an internet worth of contextual information on basically everything that has ever existed. So even though it’s wrong to say it’s 100% memorized, it’s still just intelligently regurgitating information it has learnt with new context. Being able to re-contextualize information isn’t a small feat mind u. I think gpt is amazing just like I found the original diffusion paper and wgans to be. It’s Just really overhyped to be something it isn’t and fails quite spectacularly on logical and factual queries. Cites things that don’t exist, makes simple mistakes but solves more complex ones. Tell tale sign of the model lacking a fundamental understanding of the subject.

2

BellyDancerUrgot t1_jdp945d wrote

Claim, since you managed to get lost in your own comment:

Gpt hallucinates a lot and is unreliable for any factual work. It’s useful for creative work when the authenticity of its output doesn’t have to be checked.

Your wall of text can be summarized as, “I’m gonna debate you by suggesting no one knows the definition of AGI.” The living embodiment of the saying “empty vessels make much noise. No one knows what the definition of intuition is but what we know is that memory does not play a part in it. Understanding causality does.

It’s actually hilarious that you bring up source citation as some form of trump card after I mention how everything you know about GPT4 is something someone has told you to believe in without any real discernible and reproducible evidence.

Instead of maybe asking me to spoon feed you spend a whole of 20 secs googling.

https://twitter.com/random_walker/status/1638525616424099841?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q

https://twitter.com/chhillee/status/1635790330854526981?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q

https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks

https://aiguide.substack.com/p/did-chatgpt-really-pass-graduate

“I don’t quite get it how works” + “it surprises me” ≠ it could maybe be sentient if I squint.

Thank you for taking the time to write two paragraphs pointing out my error in using the phrase “aces leetcode” after I acknowledged and corrected the mistake myself, maybe you have some word quota you were trying to fulfill with that . Inference time being dependent on length of output sequence has been a constant since the first attention paper let alone the first transformer paper. My point is, it’s good at solving leetcode when it’s present in the training set.

Ps- also kindly refrain from passing remarks on my understanding of the subject when the only arguments you can make are refuting others without intellectual dissent. It’s quite easy to say, “no I don’t believe u prove it” while also not being able to distinguish between Q K and V if it hit u on the face.

1

BellyDancerUrgot t1_jdns4yg wrote

  1. Paper summarization and factual analysis of 3d generative models, basic math, basic oops understanding were the broad topics I experimented it on. Not giving u the exact prompts but you are free to evaluate it yourselves.

  2. Wrong choice of words on my part. When I said ‘ace’ I implied that It does really good on leetcode questions from before 2021 and it’s abysmal after. Also the ones it does solve it solves at a really fast rate. From a test that happened a few weeks ago it solved 3 questions pretty much instantly and that itself would have placed it in the top 10% of competitors.

  3. Unbiased implies being tested on truly unseen data which there is far less off considering the size of the train data used. Many of the examples cited in their new paper “sparks of agi” are not even reproducible.

https://twitter.com/katecrawford/status/1638524011876433921?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q

  1. Insufficient because as I said , no world model, no intuition, only memory. Which is why it hallucinates.

  2. Intuition is understanding the structure of the world without having to have the entire internet to memorize it. A good analogy would be of how a child isnt taught how gravity works when they first start walking. Or how you can not have knowledge about a subject and still infer based on your understanding of underlying concepts.

These are things u can inherently not test or quantify when evaluating models like gpt that have been trained on everything and you still don’t know what it has been trained on lol.

  1. You can keep daring me and idc because I have these debates with fellow researchers in the field, always looking for a good debate if I have time. I’m not even an NLP researcher and even then I know the existential dread creeping in on NLP researchers because of how esoteric these results are and how AI influencers have blown things out of proportion citing cherry picked results that aren’t even reproducible because you don’t know how to reproduce them.

  2. There is no real way an unbiased scientist reads openAIs new paper on sparks of AGI and goes , “oh look gpt4 is solving AGI”.

  3. Going back on what I said earlier, yes there is always the possibility that I’m wrong and GPT is indeed the stepping stone to AGI but we don’t know because the only results u have access to are not very convincing. And on a user level it has failed to impress me beyond being a really good chatbot which can do some creative work.

3

BellyDancerUrgot t1_jdno8w6 wrote

That paper is laughable and a meme. My twitter feed has been spammed by people tweeting about this paper and as someone in academia it’s sad to see the quality for research publications to be this low. I can’t believe I’m saying this as a student of Deep Learning but Gary Marcus on his latest blogpost is actually right.

1

BellyDancerUrgot t1_jdldmda wrote

Funny cuz , I keep seeing people rave like madmen over gpt4 and chatgpt and I’ve had a 50-50 hit rage wrt good results or hallucinated bullshit with both of them. Like it isn’t even funny. People think it’s going to replace programmers and doctors meanwhile it can’t do basic shit like cite the correct paper.

Of course it aces tests and leetcode problems it was trained on. It was trained on basically the entire internet. How do you even get an unbiased estimate of test error?

Doesn’t mean it isn’t impressive. It’s just one huge block of really good associative memory. Doesn’t even begin to approach the footholds of AGI imo. No world model. No intuition. Just memory.

15

BellyDancerUrgot t1_j8gmune wrote

PyTorch MPS is buggy. Even with the stable build. Something with cuda is far better imo. Personally I use a mbp 14’ with the M1 Pro base model for literally everything and then I have a desktop (had one cuz I play games, just upgraded the gpu to a cheap 3090 I found online, works like a charm for 99% of work loads when it comes to training something.

For the 1% when I do not have enough compute I use my universities cluster or compute Canada for distributed training.

6

BellyDancerUrgot t1_j7f0u7u wrote

Okay yeah Idk wtf I was typing. Yes 0.176gb for just the parameters. U still have to account for dense representations of long sequences, that too 8 times, activations, gradients and all these multiplied by the number of layers. There was a formula to approximate the value I read somewhere online. Activations I think take up way more memory than the model itself.

The memory requirement is roughly inline with most mid size transformer models I think.

3

BellyDancerUrgot t1_j7eq93o wrote

Each Float64 is 4 bytes. U said u have 22M parameters.

Also besides ur params and activations u still have gradients + sequences are mapped for each attention head so multiply that by 8 as well.

For context I think deeplabv3 which iirc is a model with 58mil parameters was trained on 8 V100s.

Edit : I clearly had a brain stroke while writing the first part so ignore

1