Submitted by [deleted] t3_11tmu9u in MachineLearning
Single_Blueberry t1_jcjr43p wrote
Yes, it's well-known that current language models are pretty bad at math
Available_Lion_652 t1_jcjrfnx wrote
I know that autoregressive models hallucinate, but training them on a enormous clean corpus of probably several trillions tokens and images, and the fact that GPT 4 may be two magnitude orders bigger than GPT 3 didn't change the problem. The model still hallucinates
NotARedditUser3 t1_jcjsqta wrote
All language models are currently trash at math. It's not an issue of training material, it's a core flaw in how they function.
People have found some success in getting reasonable outputs from language models using language input-output chains , breaking the task up into smaller increments. Still possible to hallucinate though and i saw one really good article that explained how even tool-assisted language chains (where a language model is able to print a token in one output, to call a function in a powershell or python script to appear in the next input, to generate the correct output later on) , when generating funny unexpected numbers from a 'trusted' tool in the input, the language model sometimes still disregards it, if it's drastically farther off than what the model's own training would lead it to expect the answer to look like.
Which also makes sense - the way the language model works , as we all know, it's just calculating which words look appropriate next to each other. Or tokens, to be more exact. The language model very likely doesn't distinguish much of a difference from 123,456,789 and 123,684,849 , both probably evaluate to roughly the same accuracy stat when it's looking for answers to a math question, in that both are higher than some wildly different answer such as.... 4.
yumiko14 t1_jcju8tw wrote
link to that article please
NotARedditUser3 t1_jckof25 wrote
[deleted] OP t1_jcknwkx wrote
[deleted]
[deleted] OP t1_jcko3jr wrote
[deleted]
Available_Lion_652 t1_jcjuxp5 wrote
Is not an article. Someone on Twitter estimated the total compute power based on a report that Microsoft had 25k A100 GPU racks. That was all
NotARedditUser3 t1_jckne7y wrote
He wasn't talking to you, dingus
Available_Lion_652 t1_jckrfwd wrote
I don t understand why you insulted me. I really tried to wrote a post about a case where GPT 4 hallucinate s, with all good intentions, but I guess you have to be a smartass
Available_Lion_652 t1_jcjt3yi wrote
The tokenizer of Llama from Facebook splits numbers into digits such that the model is better at math calculations. The question that I asked the model is more than adding or subtracting numbers. The model must understand what a perfect cube is, which it does, but also it must not hallucinate when reasoning, which it fails at
kaoD t1_jcjvsmo wrote
Looks like you don't understand the comment you're replying to.
Available_Lion_652 t1_jcjw5cf wrote
I understood the post really well. My comment was an augmentation. I think you did not understand what I said
Single_Blueberry t1_jcjsxa1 wrote
>the fact that GPT 4 may be two magnitude orders bigger than GPT 3
I'm not aware of any reliable sources that claim that.
Intuitively I don't see why it would stop hallucinating. I imagine the corpus - as big as it may be - doesn't contain a lot of examples for the concept of "not knowing the answer".
That's something people use a lot in private conversation, but not in written language on the public internet or books. Which afaik is where most of the data comes from.
Available_Lion_652 t1_jcjtc6h wrote
I don t understand why people down voted. I saw a claim that GPT 4 was trained on 25k Nvidia A100 for several months. It has used x100 more compute power than GPT3, based on that post. 20 B Llama model was trained on 1.4 trillions tokens. So yeah, I think that my post is based on these claims
Single_Blueberry t1_jcjvh6o wrote
Again, can't find a reliable source for that.
I personally doubt that GPT-4 is significantly larger than GPT 3.x, simply because that would also further inflate inference cost, which you generally want to avoid in a product (as opposed to a research feat).
Better architecture, better RLHF, more and better train data, more train compute? Seems all reasonable.
Orders of magnitudes larger again? Don't think so.
Viewing a single comment thread. View all comments