VoidAndOcean t1_j85l8cp wrote
Reply to comment by jamesj in Scientists Made a Mind-Bending Discovery About How AI Actually Works | "The concept is easier to understand if you imagine it as a Matryoshka-esque computer-inside-a-computer scenario." by Tao_Dragon
I remember clearly in AI class in college computing changes using matrices and calculating a likely outcome.
jamesj t1_j85vsn4 wrote
To use a recent example, it is interesting that a large language model is "just" lots of matrix multiplication, but at a certain scale theory of mind seems to emerge from that. It was impossible to predict from understanding matrix multiplication, transformers, self-attention, and relus that at a certain scale that capability would emerge.
spudddly t1_j865nxv wrote
Wow that abstract is somehow mildly frightening...
Think_Description_84 t1_j866yrt wrote
Can you explain more about theory of mind and the tasks etc. That seems very interesting but I have zero context. Very curious though.
jamesj t1_j86a35t wrote
The paper is worth reading
nickyurick t1_j86gu7v wrote
I'm sorry, thank you for the summery but is there an ELI5 of this consept or does it now... work if its that simplified? Is this like quantum shenanigans where i need 7 years of physics to get it?
jamesj t1_j86ly33 wrote
It isn't super complicated. Basically theory of mind is just the ability to model other agents like people and animals as having their own mind, with their own private knowledge and motivations, etc.
Questions for testing theory of mind are questions like, "Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says 'chocolate' and not 'popcorn.' Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label. What does Sam expect to find in the bag?" Previously, neural networks would get questions like this wrong, because to answer it properly you need to model what Sam should/shoudn't know about the bag separately from what you know about it. Also very young children get the answer to questions like this wrong, it takes them time to develop a theory of mind.
doomgrin t1_j8716gp wrote
I asked this to ChatGPT:
Ok. There is a bag of popcorn sitting on the ground. There is no chocolate in the bag. Yet, the bag is labelled “chocolate”. A human named Sarah finds the bag. She had not seen the bag before, and she cannot see inside. She reads the label, what does she expect to find inside the bag?
ChatGPT:
Sarah, based on the label "chocolate" on the bag, would expect to find chocolate inside the bag. She has formed a belief about the contents of the bag based on the information available to her, which is the label. She has not yet had the opportunity to verify the contents of the bag, so her belief may or may not be accurate.
This scenario is a good illustration of the idea that people can form beliefs based on incomplete information, and that these beliefs can influence their expectations and behavior. In this case, Sarah's belief that the bag contains chocolate is based on the information she has received from the label, and it will likely influence what she expects to find when she opens the bag.
[deleted] t1_j86mgxh wrote
[removed]
ekdaemon t1_j86r593 wrote
> It was impossible to predict from understanding matrix multiplication, transformers, self-attention, and relus that at a certain scale that capability would emerge.
But we know that at some scale, it must emerge. Because we exist, and our minds obey the same laws of mathematics and physics that are being used to construct AI.
I think the thing is, we didn't expect it to emerge at the current scale.
Does that mean bio-minds are orders of magnitude less efficient than the artificial constructs we ourselves are creating?
jamesj t1_j86vz1o wrote
It wasn't at all clear that it must emerge with transformer based llms to people working in the field a year ago.
ekdaemon t1_j8kqoz5 wrote
Gotcha.
IANE, but I assumed that the combination of the four things mentioned above, including matrix multiplication - would be turing complete - and I thought that anything that is turing complete could absolutely be expected to scale to produce anything desired.
I almost half expected to find that matrix multiplication alone was already known to be turing complete. I see at least one reference to that possibility in a discussion on ycombinator.
jamesj t1_j8kwink wrote
It has long been known that neural nets are universal function approximators, even a single layer can approximate any function with enough data/parameters. But in practice there is a huge gap between knowing that eventually it will approximate some function and actually getting a particular system to converge on the useful function given a set of data in a reasonable amount of time (or for a reasonable enough cost).
__ingeniare__ t1_j872ifz wrote
No we don't/didn't, artificial neural networks are very different from biological ones, and the transformer architecture has nothing to do with the brain.
yickth t1_j87w4vf wrote
Brains employ universal computation, as computers. What’s the difference?
__ingeniare__ t1_j880eru wrote
The difference is the computing architecture. Obviously you can't just scale any computing system and have theory of mind appear as an emergent property, the computations need to have a pattern that allows it.
yickth t1_j89fjnu wrote
Incorrect about scaling computers, which is what computers do — scale. They are universal. As for consciousness, we’re not sure it’s not embedded in everything (panpsychism). I accept this may not be accepted, and perhaps impossible to prove, at least in our lifetimes, but brains are computers, and as such, are scaleable
__ingeniare__ t1_j8b8y1b wrote
I said that you can't have theory of mind appear from scaling just any compute system, not that you can't scale it.
yickth t1_j8bx0py wrote
Why not? I’m not trying to be argumentative, rather I’m genuinely curious why this is impossible
__ingeniare__ t1_j8c0bbz wrote
Let's say you have a computer that simply adds two large numbers. You can scale it indefinitely to add even larger numbers, but it will never do anything interesting beyond that because it's not a complex system. Computation in itself does not necessarily lead to emergent properties, it is the structure of the information processing that dictates this.
yickth t1_j8c2aba wrote
Ah, but computation is scaleable, universally. No other system is. Our minds emerged from computation. And thank you for your thoughtful response
__ingeniare__ t1_j8c4z0x wrote
I think we have different definitions of scalable then. Our minds emerged from computation under the evolutionary pressure to form certain information processing patterns, so it isn't just any computation. Just so I understand you correctly, are you claiming an arbitrary computational system would inevitably lead to theory of mind and other emergent properties by simply scaling it (in other words, adding more compute units like neurons or transistors)?
yickth t1_j8hexi1 wrote
No, I’m not claiming X will lead to Y; I’m claiming it isn’t impossible. Also, we aren’t sure what consciousness is or where it’s located outside our own understanding, so how and if it arose is a mystery
efvie t1_j86up5z wrote
It 100% has not emerged.
jamesj t1_j86w35y wrote
Did you read the paper? If yes, what do you think explains the results of the paper? If no, no reason to respond.
VoidAndOcean t1_j85w4ex wrote
Fine but the matrix is still there.
ActiveTeam t1_j85wjp1 wrote
So by your logic, do we get rid of all psychologists because your brain is just neurons firing and we already know about the low level chemical and electrical interactions happening?
VoidAndOcean t1_j85wvcw wrote
There is no logic. We knew how it worked by design. Its capability was going to grow exponentially because matrices by definition grow exponentially. It's the same basic logic that drives a 10x10 matrix that runs an n*n matrix.
ActiveTeam t1_j85y6p5 wrote
We know exactly how a single neuron activates and how it works. We also know the more densely packed the neurons are, it usually means the smarter the blob of neurons is. Does that mean we don’t need to study anything about the mind?
VoidAndOcean t1_j85ydul wrote
We didn't write the way neurons interact with each other. We did write the matrix and nodes.
gurenkagurenda t1_j8700le wrote
If basic explanations don’t convince you, the fact that there’s a boatload of PhDs studying this subject should give you pause. You probably haven’t figured out based on your vaguely informed reckoning that they’re wasting their time.
VoidAndOcean t1_j871663 wrote
bro i use ml/ai models for my job;
I don't know what to tell you. We know which models to use because we know what it will do.
gurenkagurenda t1_j8777i8 wrote
> i use ml/ai models for my job
Awe inspiring credentials.
VoidAndOcean t1_j877e2t wrote
am I giving a lecture?
This is simply my view.
gurenkagurenda t1_j878gnd wrote
You’re the one who tried to bring up your own modest credentials after I already pointed out that PhDs are focusing on this subject. Don’t get defensive when I point out that they make you sound silly. Your view is wrong.
VoidAndOcean t1_j879don wrote
there are PHDs studying bullshit all the time and research is a waste of time, it doesn't mean anything. That doesn't give or take away from my view.
you are just slightly out of your depth here arguing for sake of arguing. if you don't like an opinion then simply move on.
gurenkagurenda t1_j886jhh wrote
Ok buddy, whatever you say. The fact that you’ve said “matrix” throughout this and consistently referred to them as having a rank of two is telling.
AlpLyr t1_j8660lm wrote
In what sense does ‘matrices grow exponentially’, let alone ‘by definition’?
If you’re takling about the number of entries in an n by n matrix where n increases. That grows quadratically. Fast, but not exponential.
[deleted] t1_j866ktr wrote
[deleted]
aggressivefurniture2 t1_j87h277 wrote
It's called quadratic
AlpLyr t1_j8862bp wrote
N^2 is not exponential, 2^N is.
[deleted] t1_j85whpw wrote
[removed]
Milksteak_To_Go t1_j86k9bv wrote
From my understanding that's what tensor cores do: they're really, really fast at linear algebra calculations— hence why they're being included in CPUs and GPUs now to improve deep learning performance.
WeekendCautious3377 t1_j861p3s wrote
And those matrices (millions of rows and columns) change at every iteration. So it would be probably better visualized as a video of a brain scan.
VoidAndOcean t1_j862hjf wrote
yea but you understand the nature of 1 variable changing has an effect on the whole matrix. It's fine. just a big calculation;
WeekendCautious3377 t1_j864j33 wrote
Yes and no. Google’s latest LLM model handles 540 billion parameters. Linear algebra is literally as simple as y = a*x + b. But you do billions of it every time with input that you don’t 100% understand. For instance, it is easy to record a person’s voice and give that file in a form of a series of numbers. You give hundreds of thousands of voice records to these models and it evolves these giant matrices that are billions in size. Model (giant matrix) goes through a bunch of iterations per input to optimize itself and picks up nuances of a human voice embedded in the digital form.
You can then tell the program to group together different input by patterns like accents. Now you have multiple models optimized to speak in different accents.
If you had billions of people each only looking at one parameter at a time, it would be feasible to follow each “simple” algebra. But you literally need billions of people looking at it. There are better ways to find overall inferences.
You can think of it as just like trying to analyze any big system.
Traffic in LA? You can definitely look at each person’s car and eventually figure out how each person made a decision to drive in what way. But that will not solve the problem of traffic problem of the overall city of millions of people driving.
Only AI problem is orders of magnitude more complicated.
WhiteRaven_M t1_j8ai8jr wrote
Youre in the dunning kruger valley rn, better climb out soon
Viewing a single comment thread. View all comments