Submitted by enryu42 t3_122ppu0 in MachineLearning
liqui_date_me t1_jdr8516 wrote
This comment about GPT-4’s limited abilities in solving arithmetic was particularly interesting: https://www.reddit.com/r/singularity/comments/122ilav/why_is_maths_so_hard_for_llms/jdqsh5c/?utm_source=share&utm_medium=ios_app&utm_name=iossmf&context=3
Controversial take: GPT-4 is probably good for anything that needs lots of boilerplate code or text, like ingesting a book and writing an essay, or drafting rental contracts. There’s a lot of value in making that area of the economy more efficient for sure.
But for some of the more creative stuff it’s probably not as powerful and might actually hinder productivity. It still makes mistakes and programmers are going to have to go and fix those mistake’s retroactively.
enryu42 OP t1_jdrbyh5 wrote
Arithmetic can be solved in a toolformer-like way, by just giving it an access to a calculator. But this wouldn't help with coding.
Regarding the point about boilerplate, this is exactly what is surprising: GPT4 performs very well on exams/tests, which supposedly require some amount of creative reasoning. So either the tests are poorly designed, or it can do some creative tasks while not others. If the latter is the case, it would be interesting to learn which are the areas where it performs well, and why.
liqui_date_me t1_jdrd9dx wrote
One could argue that even standardized tests are somewhat boilerplate - if you practice enough SAT tests you’ll eventually do quite well at them, the questions are quite similar to each other from exam to exam. Ditto for AP exams.
I think a serious test for GPT4’s intelligence will be on one of the competitive entrance exams for some countries, like the IIT-JEE or the Gaokao or the International Math Olympiad, where the questions are made by domain experts and are designed to be intentionally difficult and specialized to solve.
enryu42 OP t1_jdrezba wrote
I don't know about IIT-JEE/Gaokao, but many of the problems from the International Math Olympiad are freaking hard. If the model aims for human-level intelligence, such high bar would be unfair - it is more of the realm of "the best human"-level intelligence.
To be fair, hardest problems from "AtCoder Grand" contests have the same issue. But "AtCoder Regular" problems should definitely be solvable by an average human with the right knowledge and skillset, and yet, GPT4 cannot solve anything (and it doesn't look like it is lacking knowledge).
blose1 t1_jdskab0 wrote
These models have access to all human knowledge, all scientific papers, books etc. If I would have such a knowledge I could solve any Olympiad tasks.
visarga t1_jdtxxfd wrote
You're mistaken, Olympiad problems require bespoke tricks that don't generalise from problem to problem. It's not a problem of breadth of knowledge, they don't test memorisation.
blose1 t1_jdu4cln wrote
What? Where I'm exactly mistaken? Because both of my statements are true. And there is 0% chance you can pass olympiad task without knowledge, human with all the knowledge WILL reason and come up with a solution BASED on the knowledge he has AND experience of others that is part of that knowledge, if that weren't true then no human would solve any Olympiad. Sorry, but what you wrote in context of my comment is just ridiculous, and looks like a reply to something I didn't write.
[deleted] t1_jdrxdrd wrote
[deleted]
currentscurrents t1_jdrt3gv wrote
I think all tests designed for humans are worthless here.
They're all meant to compare humans against each other, so they assume you don't have the ability to read and remember the entire internet. You can make up for a lack of reasoning with an abundance of data. We need synthetic tests designed specifically for LLMs.
Yecuken t1_jdsm4w1 wrote
Tests would not help against optimization, models will just learn how to pass the test. Optimization will always win against any problem with a known solution
maxToTheJ t1_jdrx281 wrote
> which supposedly require some amount of creative reasoning.
The dont which is exactly has been part of the complaints of teachers in regards to standardized testing
farox t1_jdrfllu wrote
This is pretty much it. Just yesterday I needed to write some python web ui. So I described roughly what I needed and it gave me a solution for that. It had a couple of errors but gave me a basis to then work off of. Saved me a lot of "who do I do X with flask", but little complexity. For that I am sure it would take me longer to describe it, than to implement the logic myself.
ngildea t1_jdrzg0p wrote
I agree, but is that opinion controversial? Seems patently obvious after talking to it about coding for a few minutes. Maybe it's controversial among people who have fooled themselves into thinking it's thinking?
liqui_date_me t1_jds3b6q wrote
I would say it's controversial around many folks who aren't directly involved in programming and who get impressed by cute demos on Twitter. People who actually know how to code see it as a superpower to make themselves more efficient, while also lamenting about how it makes silly mistakes.
https://www.reddit.com/r/cscareerquestions/comments/1226hcn/im_worried_about_ai_taking_our_jobs/
I highly doubt that software engineering jobs will become obsolete. There's going to be a lot of disruption and there might be some wage deflation too (imagine the price of writing the boilerplate components of an iOS app goes from 50,000 dollars to 50 dollars), but so much of software engineering is testing, QA and human collaboration. I think we're just going to have to re-orient our careers around correcting code from LLMs.
ngildea t1_jds53pl wrote
Yeah I agree with all that. I've been trying to think of an analogy. Maybe in the same way that spreadsheets didn't make accounts obsolete?
robobub t1_jdswai9 wrote
Indeed, it just made them more efficient so we need less of them and/or less pay for them.
No_Brief_2355 t1_jdthreb wrote
Less bookkeepers and lower pay but accountants (CPAs) are pretty in demand and still well paid.
__scan__ t1_jdxj30g wrote
This is what will happen if we’ve either a) exhausted demand, or b) made software development much easier such that people who previously couldn’t do it now can.
The first was likely true for accountants, but is less obviously so for software — there’s still vastly more useful software to build than actually gets built, and each piece of new software that gets built generally increases that demand.
Perhaps the second is true though — do you foresee enough non-developers being able to write, deploy, maintain, and operate production systems as a result of LLMs (in a way that high level languages and previous tooling didn’t)? If not, or if not in sufficient numbers, maybe what happens is that software developers become more in demand than ever due to their productivity increases resulting in even demand for more software (because they can write it quicker).
[deleted] t1_jdt945k wrote
[deleted]
reditum t1_jds67d4 wrote
>Controversial take
That's not controversial at all
trajo123 t1_jdsflhh wrote
> like ingesting a book
Interestingly, currently LLMs can't naturally ingest a book, since it doesn't fit in the prompt (they can fit 32K tokens that's about 24k words). This is where GPTs differ fundamentally from the human brain. GPTs always produce one token at a time, given the full prompt. There is no state kept between token generation steps other than the prompt which grows one token at a time. The human brain on the other hand has a state, and it is continuously evolving. In the case of a book, our brain state will be affected by the content of the book as we read it.
LLMs need to be able to hold more state to get to the next level. Perhaps get augmented with some sort of LSTM architecture where state can be built up from a theoretically infinite amount of input, or have another compressed/non-human-readable prompt that gets read before generating the token and gets updated after generating the token.
visarga t1_jdtyd0c wrote
> Perhaps get augmented with some sort of LSTM architecture where state can be built up from a theoretically infinite amount of input
That would be sweet, infinite input. Does RWKV do it?
robobub t1_jdsria4 wrote
While GPT-4 is autoregressive, it takes into account the tokens it has chosen to generate incrementally. So it is only limited to O(1) if it attempts to answer with the correct answer immediately. It can in theory take O(m) steps, where m is the number of intermediate tokens it predicts.
fiftyfourseventeen t1_jdt29u3 wrote
I've wasted too much time trying to do basic tasks with it as well. For example, I argued with it for many messages about something that was blatantly wrong, and it insisted it wasn't (that case it was trying to use order by similarity with an arg to sort by euclidian distance or cosine similarity, but it really didn't want to accept that cosine similarity isn't a distance metric and therefore has to be treated differently when sorting).
My most recent one was where I wasted an hour of time doing something that was literally just 1 line of code. I had videos of all different framerates, and I wanted to make them all 16fps while affecting length and speed as little as possible. It gave me a couple solutions that just straight up didn't work, and then I had to manually fix a ton of things with them, and then I finally had a scuffed and horrible solution. It wouldn't give me a better algorithm, so I tried to make one on my own, when I thought "I should Google if there's a simpler solution". From that Google search I learned "oh, there's literally just a .set_fps() method".
Anyways from using it I feel like it's helpful but not as much as people make it out to be. Honestly, GitHub copilot had been way more helpful because it can auto complete things that just take forever to write but are common, like command line args and descriptions, or pieces of repetitive code.
Haycart t1_jdtcnc5 wrote
Where are they getting O(1) from? Has some new information been released regarding GPT-4's architecture?
The standard attention mechanism in a transformer decoder (e.g. GPT 1-3) has a time complexity of O(N^2) w.r.t. the combined input and output sequence length. Computing the output autoregressively introduces another factor of N for a total of O(N^3).
There are fast attention variants with lower time complexity, but has there been any indication that GPT-4 actually uses these? And in any case, I'm not aware of any fast attention variant that could be described as having O(1) complexity.
visarga t1_jdtypz6 wrote
Doesn't autoregressive decoding cache the states for the previous tokens when decoding a new token?
Haycart t1_jdu7hlp wrote
Oh, you are probably correct. So it'd be O(N^2) overall for autoregressive decoding. Which still exceeds the O(n log n) that the linked post says is required for multiplication, though.
Viewing a single comment thread. View all comments