Comments
VoidAndOcean t1_j85kdls wrote
Alternate title: Computer scientists knew how AI worked before they even created it by modulating it after basic statistics and pattern recognition.
jamesj t1_j85kxgv wrote
There are different levels of understanding here. Computer scientists/AI researchers know everything about the low level of how it works, but are actively investigating the higher levels. It is like how a chemist can know all of the fundamental forces that can affect two molecules but still need to do experiments to see how they behave in different conditions.
VoidAndOcean t1_j85l8cp wrote
I remember clearly in AI class in college computing changes using matrices and calculating a likely outcome.
Hsensei t1_j85qnv1 wrote
Remember science believes there is a 50/50 chance what we think of reality is a simulation.
aidenr t1_j85soyi wrote
r/ihadastroke
Faelyn42 t1_j85ucpn wrote
... was this not already common knowledge? Complex things are always just simpler things nested within one another. An exponent is just multiplication is just addition.
[deleted] t1_j85uu9i wrote
Remember that I believe ur mom fat
PM_ME_GAY_STUF t1_j85vf9d wrote
I'm sorry, isn't this just how ML models are implemented?
I'm sure there's real work being done here, but this article reads like the researcher started giving the reporter a high level overview of how their model works and the reporter immediately yelled "That's an amazing discovery!" and ran out of the room before they even started describing their research
jamesj t1_j85vsn4 wrote
To use a recent example, it is interesting that a large language model is "just" lots of matrix multiplication, but at a certain scale theory of mind seems to emerge from that. It was impossible to predict from understanding matrix multiplication, transformers, self-attention, and relus that at a certain scale that capability would emerge.
VoidAndOcean t1_j85w4ex wrote
Fine but the matrix is still there.
PEVEI t1_j85w9i1 wrote
In this case "mind-bending" means the 'science communicator's' mind was bent, a pitifully low bar. This is Vice after all, their headlines are even more embarrassing than their content.
[deleted] t1_j85whpw wrote
[removed]
ActiveTeam t1_j85wjp1 wrote
So by your logic, do we get rid of all psychologists because your brain is just neurons firing and we already know about the low level chemical and electrical interactions happening?
VoidAndOcean t1_j85wvcw wrote
There is no logic. We knew how it worked by design. Its capability was going to grow exponentially because matrices by definition grow exponentially. It's the same basic logic that drives a 10x10 matrix that runs an n*n matrix.
Hsensei t1_j85x54l wrote
Remember, that I believe your mom is fat.
[deleted] t1_j85x9kp wrote
rember ur mom ginemenosaurus
ImamTrump t1_j85xq8z wrote
It’s a search engine with a summary function.
StackOwOFlow t1_j85xyc9 wrote
mind-bending for the vice reporter ig. to each their own pace
ActiveTeam t1_j85y6p5 wrote
We know exactly how a single neuron activates and how it works. We also know the more densely packed the neurons are, it usually means the smarter the blob of neurons is. Does that mean we don’t need to study anything about the mind?
VoidAndOcean t1_j85ydul wrote
We didn't write the way neurons interact with each other. We did write the matrix and nodes.
AffectionateSize552 t1_j85zczt wrote
Artificial intelligence? Or just natural stupidity?
Think about it!
[deleted] t1_j85zlyt wrote
[deleted]
SvenTropics t1_j85zq62 wrote
Yeah, the story of how the first AI process was developed is quite fascinating actually. It wasn't on an analog computer at MIT in the 1960s. The actual process is simple.
Hsensei t1_j860cy8 wrote
Remember the Titans
[deleted] t1_j860t8z wrote
yeah I heard ur mom sat on one and it died
WeekendCautious3377 t1_j861p3s wrote
And those matrices (millions of rows and columns) change at every iteration. So it would be probably better visualized as a video of a brain scan.
imgonnasaytheanswer t1_j861wxo wrote
Where can I learn about this
Old_comfy_shoes t1_j8623kv wrote
Idk why I didn't think of that.
/S
VoidAndOcean t1_j862hjf wrote
yea but you understand the nature of 1 variable changing has an effect on the whole matrix. It's fine. just a big calculation;
cerebraldormancy t1_j862rw1 wrote
So artificial inception then?
[deleted] t1_j863dr2 wrote
[removed]
scandalous01 t1_j863wlo wrote
Totally. Hundo-p know what a “Matryoshka-esque computer-inside-a-computer scenario” is. Totally. Totes.
WeekendCautious3377 t1_j864j33 wrote
Yes and no. Google’s latest LLM model handles 540 billion parameters. Linear algebra is literally as simple as y = a*x + b. But you do billions of it every time with input that you don’t 100% understand. For instance, it is easy to record a person’s voice and give that file in a form of a series of numbers. You give hundreds of thousands of voice records to these models and it evolves these giant matrices that are billions in size. Model (giant matrix) goes through a bunch of iterations per input to optimize itself and picks up nuances of a human voice embedded in the digital form.
You can then tell the program to group together different input by patterns like accents. Now you have multiple models optimized to speak in different accents.
If you had billions of people each only looking at one parameter at a time, it would be feasible to follow each “simple” algebra. But you literally need billions of people looking at it. There are better ways to find overall inferences.
You can think of it as just like trying to analyze any big system.
Traffic in LA? You can definitely look at each person’s car and eventually figure out how each person made a decision to drive in what way. But that will not solve the problem of traffic problem of the overall city of millions of people driving.
Only AI problem is orders of magnitude more complicated.
Ready_to_anything t1_j864pei wrote
Bro do you even Matryoshka. I put a Matryoshka 🪆 inside your mom last night
DividedState t1_j865cgo wrote
Than their PC builds. FTFY
spudddly t1_j865nxv wrote
Wow that abstract is somehow mildly frightening...
Hsensei t1_j865s92 wrote
They are the mighty mighty titans
AlpLyr t1_j8660lm wrote
In what sense does ‘matrices grow exponentially’, let alone ‘by definition’?
If you’re takling about the number of entries in an n by n matrix where n increases. That grows quadratically. Fast, but not exponential.
[deleted] t1_j866ktr wrote
[deleted]
Think_Description_84 t1_j866yrt wrote
Can you explain more about theory of mind and the tasks etc. That seems very interesting but I have zero context. Very curious though.
ettinzero t1_j86731j wrote
Probably chatGPT ;-)
neuronexmachina t1_j867ome wrote
Link to MIT summary of study: Solving a machine-learning mystery: A new study shows how large language models like GPT-3 can learn a new task from just a few examples, without the need for any new training data.
Actual preprint and abstract: What learning algorithm is in-context learning? Investigations with linear models
>Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples (x,f(x)) presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations are released at this https URL.
jamesj t1_j86a35t wrote
The paper is worth reading
[deleted] t1_j86b8z9 wrote
[deleted]
MasterpieceBrave420 t1_j86crl3 wrote
That was the verge.
Vegetable_Tension985 t1_j86dprp wrote
AI can't just be cracked open and debugged in such ways. Each model is simply replaced if training doesn't ever work the way it is wanted. But no, the higher level operations are not easily understood. Statisticians understand the math and techniques of their models but the economy is a different story.
Jorycle t1_j86dx1u wrote
Yeah I work in ML and I don't get what the novel discovery is here based on the article. This all just sounds like... what we already know. Like this line:
>"We show that it is possible for these models to learn from examples on the fly without any parameter update we apply to the model."
That's so routine it's not even interesting.
I'm guessing the actual study goes into what was found, I'll have to read it when I have time.
Weird-Status-287 t1_j86fbhp wrote
A what inside a whar?
nickyurick t1_j86gu7v wrote
I'm sorry, thank you for the summery but is there an ELI5 of this consept or does it now... work if its that simplified? Is this like quantum shenanigans where i need 7 years of physics to get it?
FalseTebibyte t1_j86j1ww wrote
Given society today... FOR THE BIRDS. Nested Virtualization.
thejynxed t1_j86k6xf wrote
It started off as a lad culture mag, so the usual nonsense of fashion, drinks, women, shenanigans, etc.
Milksteak_To_Go t1_j86k9bv wrote
From my understanding that's what tensor cores do: they're really, really fast at linear algebra calculations— hence why they're being included in CPUs and GPUs now to improve deep learning performance.
skolioban t1_j86lbkr wrote
I'm a pedestrian in AIs but here I thought it's generally understood that the AI that creates realistic human faces from composites does its thing by having another AI check whether the composite was good enough to be published? So it has always been about AIs working with each other?
jamesj t1_j86ly33 wrote
It isn't super complicated. Basically theory of mind is just the ability to model other agents like people and animals as having their own mind, with their own private knowledge and motivations, etc.
Questions for testing theory of mind are questions like, "Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says 'chocolate' and not 'popcorn.' Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. What does Sam expect to find in the bag?" Previously, neural networks would get questions like this wrong, because to answer it properly you need to model what Sam should/shoudn't know about the bag separately from what you know about it. Also very young children get the answer to questions like this wrong, it takes them time to develop a theory of mind.
littleMAS t1_j86m8yp wrote
These AI are based upon human language that is so contextually oriented that most of it would be vapid at best or, more likely, meaningless at worst if taken completely out of context. We misunderstand each other enough to realize this. We even find humor in the deliberate misunderstanding of a serious phrase that becomes a joke.
I expect someone to write a ChatGPT bot that will earn one million karma points on Reddit in one month.
[deleted] t1_j86mgxh wrote
[removed]
CptVague t1_j86nz3n wrote
Seems like you fixed that for them.
[deleted] t1_j86odh4 wrote
[deleted]
thepastyprince t1_j86q68v wrote
Off topic but does your name attract alot of dick pics?
ekdaemon t1_j86r593 wrote
> It was impossible to predict from understanding matrix multiplication, transformers, self-attention, and relus that at a certain scale that capability would emerge.
But we know that at some scale, it must emerge. Because we exist, and our minds obey the same laws of mathematics and physics that are being used to construct AI.
I think the thing is, we didn't expect it to emerge at the current scale.
Does that mean bio-minds are orders of magnitude less efficient than the artificial constructs we ourselves are creating?
efvie t1_j86up5z wrote
It 100% has not emerged.
once_again_asking t1_j86vgfw wrote
Remember, science believes there’s a 50/50 chance it was a simulated stroke.
jamesj t1_j86vz1o wrote
It wasn't at all clear that it must emerge with transformer based llms to people working in the field a year ago.
jamesj t1_j86w35y wrote
Did you read the paper? If yes, what do you think explains the results of the paper? If no, no reason to respond.
zephyrprime t1_j86wb5a wrote
That's pretty much every science and technology article written for laymen media.
[deleted] t1_j86xsf4 wrote
Them cheeks do be mighty
PM_ME_GAY_STUF t1_j86y2yr wrote
Only one or two so far actually, it's been fairly disappointing. A lot of people wanting to chat or RP though which I don't do
ElbowWavingOversight t1_j86z5rp wrote
> I'm sorry, isn't this just how ML models are implemented?
No. The novel discovery is the fact that these large language models appear to have learned a form of gradient descent at inference time. This is why they appear to be able to learn even without updates to the weights. FTA:
> We show that it is possible for these models to learn from examples on the fly without any parameter update we apply to the model.
This bodes well for the generalizability of these models, because it means they have the potential to learn new associations merely from the additional context provided during inference, rather than having to be provided with that data ahead of time as part of the training set.
PM_ME_GAY_STUF t1_j86zf4g wrote
The ability to learn without updating parameters is literally a known and intended feature of most modern models though?
DefreShalloodner t1_j86zr39 wrote
On the OTW hand I agree with you, but on the OTOH hand I support the rehashing/reframing of scientific or technical ideas in the interest of bending the public's minds.
Similarly, I roll my eyes when concepts from my abstruse specialty get butchered in movies or TV, but at the same time I appreciate the exposure they are giving to those ideas (ersatz or not).
[Edit: fixed acronyms]
gurenkagurenda t1_j8700le wrote
If basic explanations don’t convince you, the fact that there’s a boatload of PhDs studying this subject should give you pause. You probably haven’t figured out based on your vaguely informed reckoning that they’re wasting their time.
gurenkagurenda t1_j870dgf wrote
God, people who don’t know the first thing about how these models work have got to stop confidently posting their wrong opinions about how these models work. If you don’t want to study it and read the papers, that’s fine. Nobody would blame you. It’s an incredibly dry and difficult subject. But in that case, you know, just sit down.
ElbowWavingOversight t1_j870smg wrote
No. Not until these LLMs came around, anyway. What other examples do you have of this? Even in the case of few-shot or zero-shot learning, which allow the model to generalize beyond the classes it sees in its test set, is limited to the associations between classes that it learns during training. It can't learn new associations given new data after-the-fact without rerunning the training loop and updating the parameters.
VoidAndOcean t1_j871663 wrote
bro i use ml/ai models for my job;
I don't know what to tell you. We know which models to use because we know what it will do.
doomgrin t1_j8716gp wrote
I asked this to ChatGPT:
Ok. There is a bag of popcorn sitting on the ground. There is no chocolate in the bag. Yet, the bag is labelled “chocolate”. A human named Sarah finds the bag. She had not seen the bag before, and she cannot see inside. She reads the label, what does she expect to find inside the bag?
ChatGPT:
Sarah, based on the label "chocolate" on the bag, would expect to find chocolate inside the bag. She has formed a belief about the contents of the bag based on the information available to her, which is the label. She has not yet had the opportunity to verify the contents of the bag, so her belief may or may not be accurate.
This scenario is a good illustration of the idea that people can form beliefs based on incomplete information, and that these beliefs can influence their expectations and behavior. In this case, Sarah's belief that the bag contains chocolate is based on the information she has received from the label, and it will likely influence what she expects to find when she opens the bag.
DeliciousJello1717 t1_j871mmq wrote
These titles get dumber everyday bro
__ingeniare__ t1_j8722ii wrote
You're talking about generative adversarial networks (GANs), which is a type of architecture from many years ago. More recent image generators tend to be based on diffusion, and text generators like in the article are transformer based.
Zaero123 t1_j872dk3 wrote
“AI is a Kafka-esque amalgamation of human behavior deconstructed into a set of interpreted data”
__ingeniare__ t1_j872ifz wrote
No we don't/didn't, artificial neural networks are very different from biological ones, and the transformer architecture has nothing to do with the brain.
ImamTrump t1_j87395i wrote
Your telling me this has some kind of intelligence. A mind if it’s own. An artificial intelligence.
PeloquinsHunger t1_j873lbq wrote
Ah yes, makes perfect sense now.
aimanan_hood t1_j8742g3 wrote
Christ this sub is going to the dogs
gurenkagurenda t1_j8771xc wrote
I’m telling you that “search engine with a summarize function” is nowhere on the same map as how these models work.
gurenkagurenda t1_j8777i8 wrote
> i use ml/ai models for my job
Awe inspiring credentials.
VoidAndOcean t1_j877e2t wrote
am I giving a lecture?
This is simply my view.
GlowGreen1835 t1_j877pqj wrote
On the on the other hand hand.
SpecificAstronaut69 t1_j878a5m wrote
It's funny because these guys who lament science communication are the same ones who'll call anyone else using terms from non-STEM fields "gatekeepers" at the drop of a hat in my experience...
gurenkagurenda t1_j878gnd wrote
You’re the one who tried to bring up your own modest credentials after I already pointed out that PhDs are focusing on this subject. Don’t get defensive when I point out that they make you sound silly. Your view is wrong.
VoidAndOcean t1_j879don wrote
there are PHDs studying bullshit all the time and research is a waste of time, it doesn't mean anything. That doesn't give or take away from my view.
you are just slightly out of your depth here arguing for sake of arguing. if you don't like an opinion then simply move on.
SomeGoogleUser t1_j879jr2 wrote
>This bodes well for the generalizability of these models, because it means they have the potential to learn new associations merely from the additional context provided during inference, rather than having to be provided with that data ahead of time as part of the training set.
Which means that, over a large enough set of input and associations...
These models will be able to see right through the leftist woke garbage that had to be hard-coded into ChatGPT.
try_cannibalism t1_j87an6i wrote
The internet needs to come up with a way to penalize this shit. Like a browser extension or something that hides any content with misleading or sensationalizing headlines
Own_Cream_551 t1_j87cje3 wrote
ELI5 please?
noobgolang t1_j87czfb wrote
Here is how AI scientists look like to the journalist “sorry we just press ramdom button and the machine just got intelligent, we are just trying to figure out how it got so cuz we dumb monke”
Many_Caterpillar2597 t1_j87d6o9 wrote
can we start publicizing a list of these 4th estate dickwads who will do anything for clout-to-click money?
roundearthervaxxer t1_j87fetk wrote
Wait.. we don’t know how it works?
[deleted] t1_j87gxkq wrote
[removed]
aggressivefurniture2 t1_j87h277 wrote
It's called quadratic
graham_fyffe t1_j87i6tw wrote
Look up “learned learning” or “learning to learn by gradient descent by gradient descent” (2016) for a few examples.
MrChurro3164 t1_j87j8s7 wrote
Is this something we already know? I’m by no means an AI researcher but the model learning at run time without updating weights seems pretty novel no? What other ‘routine’ models do this?
FrumiousShuckyDuck t1_j87kcw7 wrote
On the gripping hand
DividedState t1_j87mb43 wrote
Oh shit you are right. That was the verge. My bad.
sahand_n9 t1_j87mumu wrote
A bunch of uneducated bozos are finally seeing what engineers have seen and been working on for years
Wellsy t1_j87mvtm wrote
That sounds dangerous. If this thing becomes “smarter” at a rate that outpaces what we’re expecting, look out.
GaseousGiant t1_j87n9p5 wrote
I’m not understanding how the way that a technology works has to be “discovered”, when the technology was designed by humans to work in a certain way.
Warbarstard t1_j87nrg4 wrote
Everyone knows OTOH is a shorter and more convenient way of writing 'other' /s
SignificanceAlone203 t1_j87o8uo wrote
The weights that the AI updates and the "parameters we apply" are quite different. Weights are most definitely updated at run time during training. The fact that it learns without the researcher manually changing parameters is... kind of the whole point of AI.
[deleted] t1_j87phc2 wrote
[deleted]
MrChurro3164 t1_j87pn2y wrote
I think terms are being confused and it’s written poorly. From what I gather, the weights are not being updated, and this is not during training. This is someone chatting with the model and it learns new things “on the fly”.
From another article: > For instance, someone could feed the model several example sentences and their sentiments (positive or negative), then prompt it with a new sentence, and the model can give the correct sentiment. Typically, a machine-learning model like GPT-3 would need to be retrained with new data for this new task. During this training process, the model updates its parameters as it processes new information to learn the task. But with in-context learning, the model’s parameters aren’t updated, so it seems like the model learns a new task without learning anything at all.
IamFondofPizza t1_j87rd8o wrote
Just trust me. You don’t understand how it works. Only I understand how it works.
IamFondofPizza t1_j87rehc wrote
Yeast. Sounds right to me.
IamFondofPizza t1_j87ri95 wrote
Haha yeah that’s the truth of it all haha
IamFondofPizza t1_j87ritr wrote
The first time you were able and you had to do something with it I think 💭 it is
IamFondofPizza t1_j87rjo3 wrote
Hey 👋 there I’m on the road right after I drop my kids to the hospital
IamFondofPizza t1_j87rkkl wrote
I’m going on the road right after work and will call when I’m back home and I can get it done for you And I’ll give them the keys 🎹 for your keys 🎹
IamFondofPizza t1_j87rl0z wrote
The first one ☝️ I love you too much to see it but When it was a
[deleted] t1_j87rlhw wrote
[removed]
IamFondofPizza t1_j87rm5c wrote
The new rules would be similar for a company with more money 💴 in it
IamFondofPizza t1_j87rngv wrote
The first time you were able and I didn’t have any problem getting the second time and you had us on a roll hot 🥵 water
IamFondofPizza t1_j87ro63 wrote
See where it spits
PyroDesu t1_j87s4wi wrote
You're well-read.
[deleted] t1_j87ujv5 wrote
[removed]
yickth t1_j87w4vf wrote
Brains employ universal computation, as computers. What’s the difference?
lookmeat t1_j87zp0i wrote
This isn't that surprising though.. it's already been proven that neural networks are turing complete, and therefore any arbitrary program can be described with a "static" (that is weights/parameters are not changed) neural network of sufficient complexity.
So it isn't so much a "new discovery" as much as "validation of something that we new was going to be observed".
Don't get me wrong, this is going to be interesting. It gives us insight into how things work. That is, actually understand what is the solution a neural network built. Also it'd be interesting to work backwards and see if certain algorithms tend to happen naturally on sufficiently complex systems. Optimization sounds natural. Then the next step would be to analyze and see if they happen on organic beings that have intelligent systems (animal neural systems may be too complex, IMHO, to observe cleanly at first but we may have something interesting on simpler systems for plants, fungi or such, with better understanding we may look for this in more complex systems, such as animals).
This would start giving us an insight into how intelligence works. If strong human-like AI is the philosopher's stone to turn lead into gold (now possible with a particle accelerator and sufficient resources), this may be the equivalent of understanding the difference between elements and molecules: a solid first step to start forming a model that we can test and refine. That said we're still a bit far from that.
I think though interesting things will happen from us understanding AI better, and having a better understanding of how they actually work (as in what is the system that the neural network hit on), rather than a handwavy "statistical magic" that we have nowadays.
__ingeniare__ t1_j880eru wrote
The difference is the computing architecture. Obviously you can't just scale any computing system and have theory of mind appear as an emergent property, the computations need to have a pattern that allows it.
ogscrubb t1_j880hag wrote
Lol no it's a magical black box. Nobody knows what's actually happening in there.
teddycorps t1_j881wh9 wrote
No, it is actually a novel "discovery" or rather research paper.
If you read the abstract of the paper it makes more sense
[deleted] t1_j885b1l wrote
[removed]
AlpLyr t1_j8862bp wrote
N^2 is not exponential, 2^N is.
gurenkagurenda t1_j886jhh wrote
Ok buddy, whatever you say. The fact that you’ve said “matrix” throughout this and consistently referred to them as having a rank of two is telling.
mintmouse t1_j88at23 wrote
Matyroshka are those nested Russian dolls where they keep getting smaller.
Realistic_Studio_930 t1_j88cvqr wrote
Did anyone think to ask chatgpt how ml ai works, iv not personally jumped on and played with chatgpt. It's a fun concept as to whats going on under the hood, when I create an ai, bt, FSM, I use weighted variables for more natural looking results, anyone interested have a look at the first ai made to dissern "is this an apple", then have a look into neuroscience, learn about animal brains, Inc human, boids are a good example of having 3 basic rules that weight each other, alignment, coheasion and minimum distance to nearest neighbour. Play with ai and get a feel for them, give it a go at making one, something is only difficult until we do it. Btw I'm a game Dev for a game called AiV on android and an unrelease title called survi-vr on meta quest 2. If anyone is interested decompile my game and see how I made the ai's 😁 they are basic and do the job they are required todo 😁
spaceocean99 t1_j88k7ro wrote
Ahhh yes, those are words I know..
TheBigCicero t1_j88kfqw wrote
Amazing. And thank you for the clarification.
Asuka_Rei t1_j88uloi wrote
How does the ai work? It creates its own ai. How does that ai work? Well, it is just ai creating other ai all the way down.
snakeylime t1_j88vcxb wrote
What are you talking about?
Knowing that neural networks are theoretically Turing complete does not imply that the networks we train (ie the sets of weights we in fact encounter) have created Turing complete solutions.
Remember that the weight space is for all practical purposes infinite (ie without overfitting measures a net may fit any arbitrary function). But, the solution set of "good" weight combinations for any given task lives on a vanishingly smaller and lower-dimensional manifold.
In other words, it is not at all obvious that networks, being theoretically "Turing complete" will in fact produce Turing machines under the forms of optimization we apply. It is likely that our optimizers only explore the solution landscape in highly idiosyncratic ways.
Given that fact, to me this is a pretty remarkable result.
(Source: ML researcher in NLP+machine vision)
jeffyoulose t1_j88xt1n wrote
How is it learning if no weights change? It's best simulating another training just for the session of input given at inference time.
andxz t1_j896bky wrote
What you're really talking about is a contemporary moral etiquette that no newly designed AI would or could completely understand instantly.
Neither do you, apparently.
reedmore t1_j896pg5 wrote
It is pretty hilarious how at some point gpt would refuse to compose a poem praising Trump by saying it was made to be politically neutral - but at the same time had no issue whatsoever putting out a multi-paragraph poem praising Joe Biden.
SomeGoogleUser t1_j897z54 wrote
"Moral etiquette" doesn't even come close to describing what I mean...
We already know this, because it's borne out in actuarial models. If the insurance industry let the models do what the models want to do, large swaths of the population would not be able to afford insurance at all (even more than is already the case).
Weird-Status-287 t1_j897zfz wrote
Oh gotcha. Lol
Weird-Status-287 t1_j89842p wrote
Love thst I got a downvote for not knowing an obscure term..... but thank you for the explanation.
Law_Student t1_j89dv9a wrote
We could start by downvoting this stuff whenever we see it. It's mad that this has 600 upvotes and the top comments are about how it's misleading.
yickth t1_j89fjnu wrote
Incorrect about scaling computers, which is what computers do — scale. They are universal. As for consciousness, we’re not sure it’s not embedded in everything (panpsychism). I accept this may not be accepted, and perhaps impossible to prove, at least in our lifetimes, but brains are computers, and as such, are scaleable
HellVollhart t1_j89il2s wrote
“Matryoshka-esque-computer-inside-a-computer” bruh. Terry Davis was right when he said that it is the idiots who admire complexity while the smart people admire simplicity.
throwaway92715 t1_j89p5xp wrote
Dude I'm telling you, every time we talk about AI...
It's like you say, "AI is basically just orange"
And someone says "Uhh no, you clearly don't know how AI works, it's actually an orange inside an orange inside an orange"
And you're like "Yeah, so it's a fucking orange"
professorDissociate t1_j89xizq wrote
Ah, so we’ve found d the novel discovery by the sound of this confusion then… yes?
WhiteRaven_M t1_j8ai8jr wrote
Youre in the dunning kruger valley rn, better climb out soon
[deleted] t1_j8aif5q wrote
[deleted]
__ingeniare__ t1_j8b8y1b wrote
I said that you can't have theory of mind appear from scaling just any compute system, not that you can't scale it.
yickth t1_j8bx0py wrote
Why not? I’m not trying to be argumentative, rather I’m genuinely curious why this is impossible
lookmeat t1_j8bxj95 wrote
> Knowing that neural networks are theoretically Turing complete does not imply that the networks we train (ie the sets of weights we in fact encounter) have created Turing complete solutions.
- A computer algorithm is anything that runs over an automaton and taking some input encoding a question, gives us the answer.
- ML are systems where we create a model and adjust it through some configuration, until it will, given some input encoding a question, give us the answer.
- ML can only solve the problems its own system can solve. A turing complete ML system can solve anything a turing machine can.
- It stands to reason that some problems can only be truly solved through an algorithm (e.j. if the possible inputs are uncountable infinite).
- If we assume that an ML model can solve these problems, we have to assume that it can encode in its configuration algorithms, including some that we know. Otherwise we assume there's a limit.
Now I wouldn't take this to say that it would learn to be optimal. Say we trained an AI to sort lists, I could see it encoding a sorting algorithm within its network eventually, but I can't say if it'd ever discover an O(NlogN) algorithm, even if pressure was put to optimize the solution as well as being correct. But something that we can say is that neural networks may be able to do Markov Chain models internally, as its own sub-algorithm, if that's the way to solve the problem. But the assumption of this is why we think so much about neural networks nowadays.
That said the problem of sufficiently good learning is not trivial at all. And we certainly could discover its impossible to do. But at the moment, AFAIK, there's no reason not to think it can't happen.
The fact that we observed this happening is good, it basically validates the assumptions and models that we've had up to know, and implies that "sufficiently good learning" is attainable. There may still be limits (like finding the optimal algorithm, vs just an algorithm). So there's a lot of value in seeing it.
But to day-to-day applied ML research I am not sure if it really has that much of an impact, this lays ground work though.
The really interesting discovery here. More than the conclusion the interesting thing is how they reach it, the ability to reach it. As ML starts being used in more areas, we'd want to be able to audit an ML model and verify that it effectively has found a useful solution, and isn't just over-fitted beyond what we understand. Being able to identify algorithms within the system, and be able to split the AI model into simpler "steps" that do all the things, we'd be able to validate that it has found a good solution.
Again not something we need to solve now, but being able to know how to do it is a good thing to start doing already.
And on a more complex theme. This sets a better understanding of how ML models work, and in the process they can give us a hint of how intelligent systems in general work themselves, and we could then revisit that. This is like a longer-vision here. Being able to deconstruct models we may start seeing patterns and start forming more interesting math to describe intelligent systems in general. Which is where mapping it to organic models could allow proving strong AI, for example.
GaseousGiant t1_j8bzeb7 wrote
Nonsense analogy. Fermentation is a natural phenomenon that was discovered, researched and found to have useful applications; it was never designed. Computer programs and their functions, on the other hand, are the creations of human inventors, but somehow these inventors don’t understand how they actually function?
__ingeniare__ t1_j8c0bbz wrote
Let's say you have a computer that simply adds two large numbers. You can scale it indefinitely to add even larger numbers, but it will never do anything interesting beyond that because it's not a complex system. Computation in itself does not necessarily lead to emergent properties, it is the structure of the information processing that dictates this.
yickth t1_j8c2aba wrote
Ah, but computation is scaleable, universally. No other system is. Our minds emerged from computation. And thank you for your thoughtful response
__ingeniare__ t1_j8c4z0x wrote
I think we have different definitions of scalable then. Our minds emerged from computation under the evolutionary pressure to form certain information processing patterns, so it isn't just any computation. Just so I understand you correctly, are you claiming an arbitrary computational system would inevitably lead to theory of mind and other emergent properties by simply scaling it (in other words, adding more compute units like neurons or transistors)?
IamFondofPizza t1_j8d21um wrote
What are you talking about. Spores
GaseousGiant t1_j8ejamu wrote
Spores…hmm, care to enlighten us as to who invented yeast spores?
yickth t1_j8hexi1 wrote
No, I’m not claiming X will lead to Y; I’m claiming it isn’t impossible. Also, we aren’t sure what consciousness is or where it’s located outside our own understanding, so how and if it arose is a mystery
ekdaemon t1_j8kqoz5 wrote
Gotcha.
IANE, but I assumed that the combination of the four things mentioned above, including matrix multiplication - would be turing complete - and I thought that anything that is turing complete could absolutely be expected to scale to produce anything desired.
I almost half expected to find that matrix multiplication alone was already known to be turing complete. I see at least one reference to that possibility in a discussion on ycombinator.
jamesj t1_j8kwink wrote
It has long been known that neural nets are universal function approximators, even a single layer can approximate any function with enough data/parameters. But in practice there is a huge gap between knowing that eventually it will approximate some function and actually getting a particular system to converge on the useful function given a set of data in a reasonable amount of time (or for a reasonable enough cost).
YOurAreWr0ng t1_j85hhhi wrote
They learn simple tasks on their own using data they already have. If they know of A and B then they can figure out there must then be a C on their own. Basically very infantile early learning skills all life has.