Single_Blueberry t1_jcjr43p wrote on March 17, 2023 at 10:16 AM

#2,251,874

Yes, it's well-known that current language models are pretty bad at math

LcuBeatsWorking t1_jcjref0 wrote on March 17, 2023 at 10:20 AM

#2,251,892

There are tons of mistakes in these models, many are more subtle. Which is why people should stop hyping it so much. It's not just math..

Available_Lion_652 t1_jcjrfnx wrote on March 17, 2023 at 10:20 AM

#2,251,894

Replying to Single_Blueberry (#2,251,874)

I know that autoregressive models hallucinate, but training them on a enormous clean corpus of probably several trillions tokens and images, and the fact that GPT 4 may be two magnitude orders bigger than GPT 3 didn't change the problem. The model still hallucinates

pobtastic t1_jcjs0yn wrote on March 17, 2023 at 10:28 AM

#2,251,911

I asked it to rewrite a simple bash script “so it doesn’t look like I stole it” (just for kicks) and all it did was to rename functions… literally everything else, even the comments were exactly identical… Not very impressive.

Jean-Porte t1_jcjset8 wrote on March 17, 2023 at 10:33 AM

#2,251,929

On this account, 90+% of humans are dumb

seba07 t1_jcjsmxi wrote on March 17, 2023 at 10:35 AM

#2,251,932

The model predicts (in a nutshell) the next word in the answer. It simply can't do math. That's just a known limitation.

NotARedditUser3 t1_jcjsqta wrote on March 17, 2023 at 10:37 AM

#2,251,939

Replying to Available_Lion_652 (#2,251,894)

All language models are currently trash at math. It's not an issue of training material, it's a core flaw in how they function.

People have found some success in getting reasonable outputs from language models using language input-output chains , breaking the task up into smaller increments. Still possible to hallucinate though and i saw one really good article that explained how even tool-assisted language chains (where a language model is able to print a token in one output, to call a function in a powershell or python script to appear in the next input, to generate the correct output later on) , when generating funny unexpected numbers from a 'trusted' tool in the input, the language model sometimes still disregards it, if it's drastically farther off than what the model's own training would lead it to expect the answer to look like.

Which also makes sense - the way the language model works , as we all know, it's just calculating which words look appropriate next to each other. Or tokens, to be more exact. The language model very likely doesn't distinguish much of a difference from 123,456,789 and 123,684,849 , both probably evaluate to roughly the same accuracy stat when it's looking for answers to a math question, in that both are higher than some wildly different answer such as.... 4.

Single_Blueberry t1_jcjsxa1 wrote on March 17, 2023 at 10:39 AM

#2,251,942

Replying to Available_Lion_652 (#2,251,894)

>the fact that GPT 4 may be two magnitude orders bigger than GPT 3

I'm not aware of any reliable sources that claim that.

Intuitively I don't see why it would stop hallucinating. I imagine the corpus - as big as it may be - doesn't contain a lot of examples for the concept of "not knowing the answer".

That's something people use a lot in private conversation, but not in written language on the public internet or books. Which afaik is where most of the data comes from.

NotARedditUser3 t1_jcjsxlo wrote on March 17, 2023 at 10:39 AM

#2,251,943

Replying to pobtastic (#2,251,911)

i think there you just have to be more creative in your prompt.... I want you to restructure this code to where entirely different methods are called, the comments are different, but the result / output is still effectively the same.....

boostwtf t1_jcjszyg wrote on March 17, 2023 at 10:40 AM

#2,251,949

Are we using the term 'hallucinate' now? :D

bacon_boat t1_jcjt0bi wrote on March 17, 2023 at 10:40 AM

#2,251,951

I know, that joint probability distribution P(next_word|words) is really *dumb*.

Available_Lion_652 t1_jcjt3yi wrote on March 17, 2023 at 10:41 AM

#2,251,958

Replying to NotARedditUser3 (#2,251,939)

The tokenizer of Llama from Facebook splits numbers into digits such that the model is better at math calculations. The question that I asked the model is more than adding or subtracting numbers. The model must understand what a perfect cube is, which it does, but also it must not hallucinate when reasoning, which it fails at

[deleted] OP t1_jcjt6mc wrote on March 17, 2023 at 10:42 AM

#2,251,962

[deleted]

DamienLasseur t1_jcjt8gp wrote on March 17, 2023 at 10:43 AM

#2,251,964

Replying to boostwtf (#2,251,949)

Researchers have been using the term for a while now as well. It's mostly for when the model confidently outputs an incorrect answer such as fake website links etc.

pobtastic t1_jcjtadp wrote on March 17, 2023 at 10:43 AM

#2,251,965

Replying to NotARedditUser3 (#2,251,943)

I did try a few follow up prompts, but nothing changed the structure at all - I mean, it wasn’t for any purpose other than testing it, but I definitely would have felt it unsatisfactory if I’d really needed it for something work related

Available_Lion_652 t1_jcjtc6h wrote on March 17, 2023 at 10:44 AM

#2,251,970

Replying to Single_Blueberry (#2,251,942)

I don t understand why people down voted. I saw a claim that GPT 4 was trained on 25k Nvidia A100 for several months. It has used x100 more compute power than GPT3, based on that post. 20 B Llama model was trained on 1.4 trillions tokens. So yeah, I think that my post is based on these claims

ShredForMe t1_jcjtlj5 wrote on March 17, 2023 at 10:47 AM

#2,251,987

Replying to NotARedditUser3 (#2,251,943)

then I might just as well do that myself

yumiko14 t1_jcju8tw wrote on March 17, 2023 at 10:55 AM

#2,252,021

Replying to NotARedditUser3 (#2,251,939)

link to that article please

PM_ME_ENFP_MEMES t1_jcjubn0 wrote on March 17, 2023 at 10:56 AM

#2,252,025

I read something about LLMs and why they’re so bad at math: during the tokenisation process, numbers don’t automatically get tokenised as the actual number. So, 67 may be tokenised as a token representing ‘67’ and all would be well.

However, it’s also likely that 67 may be tokenised as being two tokens, ‘6’ and ’7’, which may confuse the bot if it’s asked to do 67^2.

Available_Lion_652 t1_jcjukim wrote on March 17, 2023 at 10:59 AM

#2,252,048

Replying to PM_ME_ENFP_MEMES (#2,252,025)

Yes, there is currently a fix for this problem. In Llamas paper they splited numbers into digits 12345 became 1 2 3 4 5 29 December became 2 9 December.

It helps with addition, subtracting but not with complex reasoning

Available_Lion_652 t1_jcjuxp5 wrote on March 17, 2023 at 11:03 AM

#2,252,070

Replying to yumiko14 (#2,252,021)

Is not an article. Someone on Twitter estimated the total compute power based on a report that Microsoft had 25k A100 GPU racks. That was all

Logicalist t1_jcjv1pq wrote on March 17, 2023 at 11:04 AM

#2,252,075

Replying to DamienLasseur (#2,251,964)

Delusion. Would be entirely more accurate. Hallucinate is just wrong.

sweatierorc t1_jcjv2g4 wrote on March 17, 2023 at 11:05 AM

#2,252,077

"hype is a hell of a drug", rick james

Single_Blueberry t1_jcjvh6o wrote on March 17, 2023 at 11:09 AM

#2,252,096

Replying to Available_Lion_652 (#2,251,970)

Again, can't find a reliable source for that.

I personally doubt that GPT-4 is significantly larger than GPT 3.x, simply because that would also further inflate inference cost, which you generally want to avoid in a product (as opposed to a research feat).

Better architecture, better RLHF, more and better train data, more train compute? Seems all reasonable.

Orders of magnitudes larger again? Don't think so.

kaoD t1_jcjvsmo wrote on March 17, 2023 at 11:13 AM

#2,252,113

Replying to Available_Lion_652 (#2,251,958)

Looks like you don't understand the comment you're replying to.

olmec-akeru t1_jcjw333 wrote on March 17, 2023 at 11:16 AM

#2,252,129

Right, so ignoring the specific error and thinking about the general approach: adding a^3 is a fourth term; and it happens that a = 0.

Sneaky, but not illogical.

Edit: the above is wrong, read the thread below for OPs insights.

Available_Lion_652 t1_jcjw5cf wrote on March 17, 2023 at 11:17 AM

#2,252,132

Replying to kaoD (#2,252,113)

I understood the post really well. My comment was an augmentation. I think you did not understand what I said

Available_Lion_652 t1_jcjwf9c wrote on March 17, 2023 at 11:20 AM

#2,252,144

Replying to olmec-akeru (#2,252,129)

Yes, that was interesting, :) but it failed at adding operatios

SQLGene t1_jcjwhst wrote on March 17, 2023 at 11:21 AM

#2,252,148

"I purchased a Swiss Army Knife but it's a terrible calculator"

Available_Lion_652 t1_jcjwqg0 wrote on March 17, 2023 at 11:23 AM

#2,252,158

Replying to SQLGene (#2,252,148)

Probably I have to specify. The problem that I give to GPT 4 to solve was a 5tg grade math Olympiad problem. You re statement is unfounded

SQLGene t1_jcjx3a6 wrote on March 17, 2023 at 11:27 AM

#2,252,168

Replying to Available_Lion_652 (#2,252,158)

The title is "[D] GPT-4 is really dumb" and what would be accurate is "[D] GPT-4 is bad at math problems". This was a known issue with GPT 3.5 and I expect it to continue to be an issue. but I think it's a mischaracterization to say it's "dumb" when there are a number of non-mathematical applications where it's impressive so far.

So while my statement was a simplification, I stand by the intention. You are evaluating a tool based on an application I don't think it's meant for.

JaCraig t1_jcjx7lx wrote on March 17, 2023 at 11:28 AM

#2,252,176

Genuine question: Why are you trying to use a language model to do something that you could write a basic app to calculate?

Like I would have asked it to write an app in JavaScript, Java, C#, etc. Some popular language to calculate four perfect cubes to represent a number. That'd probably get me 90% of the way there then I'm just fixing a couple bugs. That seems like the more intuitive use case to me but I'm also a dev by trade.

Available_Lion_652 t1_jcjxe16 wrote on March 17, 2023 at 11:30 AM

#2,252,184

Replying to SQLGene (#2,252,168)

Good remarks. This is my first post on this reddit. I didn't know what title to give. I was angry at ClosedAI for not revealing models details and dataset details.

Available_Lion_652 t1_jcjxja6 wrote on March 17, 2023 at 11:32 AM

#2,252,191

Replying to JaCraig (#2,252,176)

This is a 5 the grade math Olympiad problem. Sorry for not mentioning it. Good luck if you can resolve it with a basic app to calculate it

SQLGene t1_jcjxkwp wrote on March 17, 2023 at 11:32 AM

#2,252,195

Replying to Available_Lion_652 (#2,252,184)

There's a lot of hype and misinformation going around right now, thus the sassy remarks.

This is the best post I've seen about how it works behind the scenes. We shouldn't expect fancy autocomplete to be good at math.
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

olmec-akeru t1_jck223w wrote on March 17, 2023 at 12:16 PM

#2,252,448

Replying to Available_Lion_652 (#2,252,144)

Yeah, totally right—and I understand that the specifics really matter in some cases (for example calculating a starship trajectory).

What intrigues me, is that in ideas of concept, of logic, this specific error isn't meaningful. i.e. if the sum of three primes was initially correct the approach wouldn't be invalid. There is something in this.

Available_Lion_652 t1_jck2th4 wrote on March 17, 2023 at 12:23 PM

#2,252,496

Replying to olmec-akeru (#2,252,448)

Not quite :). The second operation (a + b + c)^2014 = a^2014 + b^2014 + c^2014 is false. It does not understand complex math operations. To be sincere solving the above problem means it can do better math than most humans.

olmec-akeru t1_jck3hvp wrote on March 17, 2023 at 12:30 PM

#2,252,534

Replying to Available_Lion_652 (#2,252,496)

Precisely right; I hadn't applied my mind to that expansion. My comment is erroneous.

JaCraig t1_jckmll4 wrote on March 17, 2023 at 2:57 PM

#2,253,787

Replying to Available_Lion_652 (#2,252,191)

My point is more it's the wrong tool for the job. Something designed for calculations like wolfram alpha and their API is probably better suited:

https://www.wolframalpha.com/input?i=%28x%5E3%29%2B%28y%5E3%29%2B%28z%5E3%29+%3D+1024

BUT I did ask ChatGPT (so 3.5) to write an app to do it in a couple languages and it gave me a working app first try on each. It's not a very good app as I could optimize it a lot more, but it works. GPT-4 gave a slightly better app in each instance.

NotARedditUser3 t1_jckne7y wrote on March 17, 2023 at 3:02 PM

#2,253,825

Replying to Available_Lion_652 (#2,252,070)

He wasn't talking to you, dingus

[deleted] OP t1_jcknwkx wrote on March 17, 2023 at 3:06 PM

#2,253,859

Replying to yumiko14 (#2,252,021)

[deleted]

[deleted] OP t1_jcko3jr wrote on March 17, 2023 at 3:07 PM

#2,253,868

Replying to yumiko14 (#2,252,021)

[deleted]

NotARedditUser3 t1_jckof25 wrote on March 17, 2023 at 3:09 PM

#2,253,894

Replying to yumiko14 (#2,252,021)

https://vgel.me/posts/tools-not-needed/

Available_Lion_652 t1_jckrfwd wrote on March 17, 2023 at 3:29 PM

#2,254,067

Replying to NotARedditUser3 (#2,253,825)

I don t understand why you insulted me. I really tried to wrote a post about a case where GPT 4 hallucinate s, with all good intentions, but I guess you have to be a smartass

Comments