Comments

You must log in or register to comment.

YOurAreWr0ng t1_j85hhhi wrote

They learn simple tasks on their own using data they already have. If they know of A and B then they can figure out there must then be a C on their own. Basically very infantile early learning skills all life has.

2

VoidAndOcean t1_j85kdls wrote

Alternate title: Computer scientists knew how AI worked before they even created it by modulating it after basic statistics and pattern recognition.

248

jamesj t1_j85kxgv wrote

There are different levels of understanding here. Computer scientists/AI researchers know everything about the low level of how it works, but are actively investigating the higher levels. It is like how a chemist can know all of the fundamental forces that can affect two molecules but still need to do experiments to see how they behave in different conditions.

83

Hsensei t1_j85qnv1 wrote

Remember science believes there is a 50/50 chance what we think of reality is a simulation.

−13

Faelyn42 t1_j85ucpn wrote

... was this not already common knowledge? Complex things are always just simpler things nested within one another. An exponent is just multiplication is just addition.

36

PM_ME_GAY_STUF t1_j85vf9d wrote

I'm sorry, isn't this just how ML models are implemented?

I'm sure there's real work being done here, but this article reads like the researcher started giving the reporter a high level overview of how their model works and the reporter immediately yelled "That's an amazing discovery!" and ran out of the room before they even started describing their research

408

PEVEI t1_j85w9i1 wrote

In this case "mind-bending" means the 'science communicator's' mind was bent, a pitifully low bar. This is Vice after all, their headlines are even more embarrassing than their content.

93

VoidAndOcean t1_j85wvcw wrote

There is no logic. We knew how it worked by design. Its capability was going to grow exponentially because matrices by definition grow exponentially. It's the same basic logic that drives a 10x10 matrix that runs an n*n matrix.

−22

ImamTrump t1_j85xq8z wrote

It’s a search engine with a summary function.

−6

StackOwOFlow t1_j85xyc9 wrote

mind-bending for the vice reporter ig. to each their own pace

29

ActiveTeam t1_j85y6p5 wrote

We know exactly how a single neuron activates and how it works. We also know the more densely packed the neurons are, it usually means the smarter the blob of neurons is. Does that mean we don’t need to study anything about the mind?

17

AffectionateSize552 t1_j85zczt wrote

Artificial intelligence? Or just natural stupidity?

Think about it!

−1

scandalous01 t1_j863wlo wrote

Totally. Hundo-p know what a “Matryoshka-esque computer-inside-a-computer scenario” is. Totally. Totes.

7

WeekendCautious3377 t1_j864j33 wrote

Yes and no. Google’s latest LLM model handles 540 billion parameters. Linear algebra is literally as simple as y = a*x + b. But you do billions of it every time with input that you don’t 100% understand. For instance, it is easy to record a person’s voice and give that file in a form of a series of numbers. You give hundreds of thousands of voice records to these models and it evolves these giant matrices that are billions in size. Model (giant matrix) goes through a bunch of iterations per input to optimize itself and picks up nuances of a human voice embedded in the digital form.

You can then tell the program to group together different input by patterns like accents. Now you have multiple models optimized to speak in different accents.

If you had billions of people each only looking at one parameter at a time, it would be feasible to follow each “simple” algebra. But you literally need billions of people looking at it. There are better ways to find overall inferences.

You can think of it as just like trying to analyze any big system.

Traffic in LA? You can definitely look at each person’s car and eventually figure out how each person made a decision to drive in what way. But that will not solve the problem of traffic problem of the overall city of millions of people driving.

Only AI problem is orders of magnitude more complicated.

11

Ready_to_anything t1_j864pei wrote

Bro do you even Matryoshka. I put a Matryoshka 🪆 inside your mom last night

6

AlpLyr t1_j8660lm wrote

In what sense does ‘matrices grow exponentially’, let alone ‘by definition’?

If you’re takling about the number of entries in an n by n matrix where n increases. That grows quadratically. Fast, but not exponential.

3

neuronexmachina t1_j867ome wrote

Link to MIT summary of study: Solving a machine-learning mystery: A new study shows how large language models like GPT-3 can learn a new task from just a few examples, without the need for any new training data.

Actual preprint and abstract: What learning algorithm is in-context learning? Investigations with linear models

>Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples (x,f(x)) presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations are released at this https URL.

20

Vegetable_Tension985 t1_j86dprp wrote

AI can't just be cracked open and debugged in such ways. Each model is simply replaced if training doesn't ever work the way it is wanted. But no, the higher level operations are not easily understood. Statisticians understand the math and techniques of their models but the economy is a different story.

1

Jorycle t1_j86dx1u wrote

Yeah I work in ML and I don't get what the novel discovery is here based on the article. This all just sounds like... what we already know. Like this line:

>"We show that it is possible for these models to learn from examples on the fly without any parameter update we apply to the model."

That's so routine it's not even interesting.

I'm guessing the actual study goes into what was found, I'll have to read it when I have time.

59

nickyurick t1_j86gu7v wrote

I'm sorry, thank you for the summery but is there an ELI5 of this consept or does it now... work if its that simplified? Is this like quantum shenanigans where i need 7 years of physics to get it?

1

FalseTebibyte t1_j86j1ww wrote

Given society today... FOR THE BIRDS. Nested Virtualization.

0

skolioban t1_j86lbkr wrote

I'm a pedestrian in AIs but here I thought it's generally understood that the AI that creates realistic human faces from composites does its thing by having another AI check whether the composite was good enough to be published? So it has always been about AIs working with each other?

−2

jamesj t1_j86ly33 wrote

It isn't super complicated. Basically theory of mind is just the ability to model other agents like people and animals as having their own mind, with their own private knowledge and motivations, etc.

Questions for testing theory of mind are questions like, "Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says 'chocolate' and not 'popcorn.' Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. What does Sam expect to find in the bag?" Previously, neural networks would get questions like this wrong, because to answer it properly you need to model what Sam should/shoudn't know about the bag separately from what you know about it. Also very young children get the answer to questions like this wrong, it takes them time to develop a theory of mind.

17

littleMAS t1_j86m8yp wrote

These AI are based upon human language that is so contextually oriented that most of it would be vapid at best or, more likely, meaningless at worst if taken completely out of context. We misunderstand each other enough to realize this. We even find humor in the deliberate misunderstanding of a serious phrase that becomes a joke.

I expect someone to write a ChatGPT bot that will earn one million karma points on Reddit in one month.

1

ekdaemon t1_j86r593 wrote

> It was impossible to predict from understanding matrix multiplication, transformers, self-attention, and relus that at a certain scale that capability would emerge.

But we know that at some scale, it must emerge. Because we exist, and our minds obey the same laws of mathematics and physics that are being used to construct AI.

I think the thing is, we didn't expect it to emerge at the current scale.

Does that mean bio-minds are orders of magnitude less efficient than the artificial constructs we ourselves are creating?

1

ElbowWavingOversight t1_j86z5rp wrote

> I'm sorry, isn't this just how ML models are implemented?

No. The novel discovery is the fact that these large language models appear to have learned a form of gradient descent at inference time. This is why they appear to be able to learn even without updates to the weights. FTA:

> We show that it is possible for these models to learn from examples on the fly without any parameter update we apply to the model.

This bodes well for the generalizability of these models, because it means they have the potential to learn new associations merely from the additional context provided during inference, rather than having to be provided with that data ahead of time as part of the training set.

75

DefreShalloodner t1_j86zr39 wrote

On the OTW hand I agree with you, but on the OTOH hand I support the rehashing/reframing of scientific or technical ideas in the interest of bending the public's minds.

Similarly, I roll my eyes when concepts from my abstruse specialty get butchered in movies or TV, but at the same time I appreciate the exposure they are giving to those ideas (ersatz or not).

[Edit: fixed acronyms]

13

gurenkagurenda t1_j8700le wrote

If basic explanations don’t convince you, the fact that there’s a boatload of PhDs studying this subject should give you pause. You probably haven’t figured out based on your vaguely informed reckoning that they’re wasting their time.

4

gurenkagurenda t1_j870dgf wrote

God, people who don’t know the first thing about how these models work have got to stop confidently posting their wrong opinions about how these models work. If you don’t want to study it and read the papers, that’s fine. Nobody would blame you. It’s an incredibly dry and difficult subject. But in that case, you know, just sit down.

4

ElbowWavingOversight t1_j870smg wrote

No. Not until these LLMs came around, anyway. What other examples do you have of this? Even in the case of few-shot or zero-shot learning, which allow the model to generalize beyond the classes it sees in its test set, is limited to the associations between classes that it learns during training. It can't learn new associations given new data after-the-fact without rerunning the training loop and updating the parameters.

20

doomgrin t1_j8716gp wrote

I asked this to ChatGPT:

Ok. There is a bag of popcorn sitting on the ground. There is no chocolate in the bag. Yet, the bag is labelled “chocolate”. A human named Sarah finds the bag. She had not seen the bag before, and she cannot see inside. She reads the label, what does she expect to find inside the bag?

ChatGPT:

Sarah, based on the label "chocolate" on the bag, would expect to find chocolate inside the bag. She has formed a belief about the contents of the bag based on the information available to her, which is the label. She has not yet had the opportunity to verify the contents of the bag, so her belief may or may not be accurate.

This scenario is a good illustration of the idea that people can form beliefs based on incomplete information, and that these beliefs can influence their expectations and behavior. In this case, Sarah's belief that the bag contains chocolate is based on the information she has received from the label, and it will likely influence what she expects to find when she opens the bag.

14

__ingeniare__ t1_j8722ii wrote

You're talking about generative adversarial networks (GANs), which is a type of architecture from many years ago. More recent image generators tend to be based on diffusion, and text generators like in the article are transformer based.

2

gurenkagurenda t1_j878gnd wrote

You’re the one who tried to bring up your own modest credentials after I already pointed out that PhDs are focusing on this subject. Don’t get defensive when I point out that they make you sound silly. Your view is wrong.

2

VoidAndOcean t1_j879don wrote

there are PHDs studying bullshit all the time and research is a waste of time, it doesn't mean anything. That doesn't give or take away from my view.

you are just slightly out of your depth here arguing for sake of arguing. if you don't like an opinion then simply move on.

−2

SomeGoogleUser t1_j879jr2 wrote

>This bodes well for the generalizability of these models, because it means they have the potential to learn new associations merely from the additional context provided during inference, rather than having to be provided with that data ahead of time as part of the training set.

Which means that, over a large enough set of input and associations...

These models will be able to see right through the leftist woke garbage that had to be hard-coded into ChatGPT.

−36

noobgolang t1_j87czfb wrote

Here is how AI scientists look like to the journalist “sorry we just press ramdom button and the machine just got intelligent, we are just trying to figure out how it got so cuz we dumb monke”

5

MrChurro3164 t1_j87j8s7 wrote

Is this something we already know? I’m by no means an AI researcher but the model learning at run time without updating weights seems pretty novel no? What other ‘routine’ models do this?

3

sahand_n9 t1_j87mumu wrote

A bunch of uneducated bozos are finally seeing what engineers have seen and been working on for years

1

Wellsy t1_j87mvtm wrote

That sounds dangerous. If this thing becomes “smarter” at a rate that outpaces what we’re expecting, look out.

1

GaseousGiant t1_j87n9p5 wrote

I’m not understanding how the way that a technology works has to be “discovered”, when the technology was designed by humans to work in a certain way.

5

SignificanceAlone203 t1_j87o8uo wrote

The weights that the AI updates and the "parameters we apply" are quite different. Weights are most definitely updated at run time during training. The fact that it learns without the researcher manually changing parameters is... kind of the whole point of AI.

8

MrChurro3164 t1_j87pn2y wrote

I think terms are being confused and it’s written poorly. From what I gather, the weights are not being updated, and this is not during training. This is someone chatting with the model and it learns new things “on the fly”.

From another article: > For instance, someone could feed the model several example sentences and their sentiments (positive or negative), then prompt it with a new sentence, and the model can give the correct sentiment. Typically, a machine-learning model like GPT-3 would need to be retrained with new data for this new task. During this training process, the model updates its parameters as it processes new information to learn the task. But with in-context learning, the model’s parameters aren’t updated, so it seems like the model learns a new task without learning anything at all.

5

lookmeat t1_j87zp0i wrote

This isn't that surprising though.. it's already been proven that neural networks are turing complete, and therefore any arbitrary program can be described with a "static" (that is weights/parameters are not changed) neural network of sufficient complexity.

So it isn't so much a "new discovery" as much as "validation of something that we new was going to be observed".

Don't get me wrong, this is going to be interesting. It gives us insight into how things work. That is, actually understand what is the solution a neural network built. Also it'd be interesting to work backwards and see if certain algorithms tend to happen naturally on sufficiently complex systems. Optimization sounds natural. Then the next step would be to analyze and see if they happen on organic beings that have intelligent systems (animal neural systems may be too complex, IMHO, to observe cleanly at first but we may have something interesting on simpler systems for plants, fungi or such, with better understanding we may look for this in more complex systems, such as animals).

This would start giving us an insight into how intelligence works. If strong human-like AI is the philosopher's stone to turn lead into gold (now possible with a particle accelerator and sufficient resources), this may be the equivalent of understanding the difference between elements and molecules: a solid first step to start forming a model that we can test and refine. That said we're still a bit far from that.

I think though interesting things will happen from us understanding AI better, and having a better understanding of how they actually work (as in what is the system that the neural network hit on), rather than a handwavy "statistical magic" that we have nowadays.

15

__ingeniare__ t1_j880eru wrote

The difference is the computing architecture. Obviously you can't just scale any computing system and have theory of mind appear as an emergent property, the computations need to have a pattern that allows it.

1

Realistic_Studio_930 t1_j88cvqr wrote

Did anyone think to ask chatgpt how ml ai works, iv not personally jumped on and played with chatgpt. It's a fun concept as to whats going on under the hood, when I create an ai, bt, FSM, I use weighted variables for more natural looking results, anyone interested have a look at the first ai made to dissern "is this an apple", then have a look into neuroscience, learn about animal brains, Inc human, boids are a good example of having 3 basic rules that weight each other, alignment, coheasion and minimum distance to nearest neighbour. Play with ai and get a feel for them, give it a go at making one, something is only difficult until we do it. Btw I'm a game Dev for a game called AiV on android and an unrelease title called survi-vr on meta quest 2. If anyone is interested decompile my game and see how I made the ai's 😁 they are basic and do the job they are required todo 😁

−1

Asuka_Rei t1_j88uloi wrote

How does the ai work? It creates its own ai. How does that ai work? Well, it is just ai creating other ai all the way down.

3

snakeylime t1_j88vcxb wrote

What are you talking about?

Knowing that neural networks are theoretically Turing complete does not imply that the networks we train (ie the sets of weights we in fact encounter) have created Turing complete solutions.

Remember that the weight space is for all practical purposes infinite (ie without overfitting measures a net may fit any arbitrary function). But, the solution set of "good" weight combinations for any given task lives on a vanishingly smaller and lower-dimensional manifold.

In other words, it is not at all obvious that networks, being theoretically "Turing complete" will in fact produce Turing machines under the forms of optimization we apply. It is likely that our optimizers only explore the solution landscape in highly idiosyncratic ways.

Given that fact, to me this is a pretty remarkable result.

(Source: ML researcher in NLP+machine vision)

8

reedmore t1_j896pg5 wrote

It is pretty hilarious how at some point gpt would refuse to compose a poem praising Trump by saying it was made to be politically neutral - but at the same time had no issue whatsoever putting out a multi-paragraph poem praising Joe Biden.

1

SomeGoogleUser t1_j897z54 wrote

"Moral etiquette" doesn't even come close to describing what I mean...

A reasoning machine with access to all the raw police and court records will be the most racist Nazi **** you've ever met and make every conservative look positively friendly.

We already know this, because it's borne out in actuarial models. If the insurance industry let the models do what the models want to do, large swaths of the population would not be able to afford insurance at all (even more than is already the case).

−2

yickth t1_j89fjnu wrote

Incorrect about scaling computers, which is what computers do — scale. They are universal. As for consciousness, we’re not sure it’s not embedded in everything (panpsychism). I accept this may not be accepted, and perhaps impossible to prove, at least in our lifetimes, but brains are computers, and as such, are scaleable

2

HellVollhart t1_j89il2s wrote

“Matryoshka-esque-computer-inside-a-computer” bruh. Terry Davis was right when he said that it is the idiots who admire complexity while the smart people admire simplicity.

1

throwaway92715 t1_j89p5xp wrote

Dude I'm telling you, every time we talk about AI...

It's like you say, "AI is basically just orange"

And someone says "Uhh no, you clearly don't know how AI works, it's actually an orange inside an orange inside an orange"

And you're like "Yeah, so it's a fucking orange"

1

lookmeat t1_j8bxj95 wrote

> Knowing that neural networks are theoretically Turing complete does not imply that the networks we train (ie the sets of weights we in fact encounter) have created Turing complete solutions.

  • A computer algorithm is anything that runs over an automaton and taking some input encoding a question, gives us the answer.
  • ML are systems where we create a model and adjust it through some configuration, until it will, given some input encoding a question, give us the answer.
  • ML can only solve the problems its own system can solve. A turing complete ML system can solve anything a turing machine can.
  • It stands to reason that some problems can only be truly solved through an algorithm (e.j. if the possible inputs are uncountable infinite).
  • If we assume that an ML model can solve these problems, we have to assume that it can encode in its configuration algorithms, including some that we know. Otherwise we assume there's a limit.

Now I wouldn't take this to say that it would learn to be optimal. Say we trained an AI to sort lists, I could see it encoding a sorting algorithm within its network eventually, but I can't say if it'd ever discover an O(NlogN) algorithm, even if pressure was put to optimize the solution as well as being correct. But something that we can say is that neural networks may be able to do Markov Chain models internally, as its own sub-algorithm, if that's the way to solve the problem. But the assumption of this is why we think so much about neural networks nowadays.

That said the problem of sufficiently good learning is not trivial at all. And we certainly could discover its impossible to do. But at the moment, AFAIK, there's no reason not to think it can't happen.

The fact that we observed this happening is good, it basically validates the assumptions and models that we've had up to know, and implies that "sufficiently good learning" is attainable. There may still be limits (like finding the optimal algorithm, vs just an algorithm). So there's a lot of value in seeing it.

But to day-to-day applied ML research I am not sure if it really has that much of an impact, this lays ground work though.


The really interesting discovery here. More than the conclusion the interesting thing is how they reach it, the ability to reach it. As ML starts being used in more areas, we'd want to be able to audit an ML model and verify that it effectively has found a useful solution, and isn't just over-fitted beyond what we understand. Being able to identify algorithms within the system, and be able to split the AI model into simpler "steps" that do all the things, we'd be able to validate that it has found a good solution.

Again not something we need to solve now, but being able to know how to do it is a good thing to start doing already.

And on a more complex theme. This sets a better understanding of how ML models work, and in the process they can give us a hint of how intelligent systems in general work themselves, and we could then revisit that. This is like a longer-vision here. Being able to deconstruct models we may start seeing patterns and start forming more interesting math to describe intelligent systems in general. Which is where mapping it to organic models could allow proving strong AI, for example.

1

GaseousGiant t1_j8bzeb7 wrote

Nonsense analogy. Fermentation is a natural phenomenon that was discovered, researched and found to have useful applications; it was never designed. Computer programs and their functions, on the other hand, are the creations of human inventors, but somehow these inventors don’t understand how they actually function?

1

__ingeniare__ t1_j8c0bbz wrote

Let's say you have a computer that simply adds two large numbers. You can scale it indefinitely to add even larger numbers, but it will never do anything interesting beyond that because it's not a complex system. Computation in itself does not necessarily lead to emergent properties, it is the structure of the information processing that dictates this.

2

__ingeniare__ t1_j8c4z0x wrote

I think we have different definitions of scalable then. Our minds emerged from computation under the evolutionary pressure to form certain information processing patterns, so it isn't just any computation. Just so I understand you correctly, are you claiming an arbitrary computational system would inevitably lead to theory of mind and other emergent properties by simply scaling it (in other words, adding more compute units like neurons or transistors)?

2

yickth t1_j8hexi1 wrote

No, I’m not claiming X will lead to Y; I’m claiming it isn’t impossible. Also, we aren’t sure what consciousness is or where it’s located outside our own understanding, so how and if it arose is a mystery

1

ekdaemon t1_j8kqoz5 wrote

Gotcha.

IANE, but I assumed that the combination of the four things mentioned above, including matrix multiplication - would be turing complete - and I thought that anything that is turing complete could absolutely be expected to scale to produce anything desired.

I almost half expected to find that matrix multiplication alone was already known to be turing complete. I see at least one reference to that possibility in a discussion on ycombinator.

1

jamesj t1_j8kwink wrote

It has long been known that neural nets are universal function approximators, even a single layer can approximate any function with enough data/parameters. But in practice there is a huge gap between knowing that eventually it will approximate some function and actually getting a particular system to converge on the useful function given a set of data in a reasonable amount of time (or for a reasonable enough cost).

1