KaiSix88 t1_j1mt3vw wrote
Reply to comment by Shiningc in For the first time Open AI is investing in a small number of startups who they believe are "pushing the boundaries" of technology and AI. by ECommerce_Developer
Wow. Never thought I'd see other anti optimization based AI fans out there. I'd say though that the brain is based on probabilities, just not brute force statistical methods. Neurons in the brain after all are just coincidence detectors and that is a matter of probability.
Shiningc t1_j1nzac1 wrote
Because that's what you read on the Internet? Probabilities rely on human-made labels, and can only make a choice between A or B. That's not how human intelligence works because it can come up with an entirely new label, like say a new choice C that's not based on probabilities.
KaiSix88 t1_j1p9icr wrote
I was actually agreeing and additionally I'm a neuro computational researcher who studies biologically inspired algorithms.
Since apparently you don't understand the difference between statistical optimization and probabilities I'll give you a lesson.
Statistical optimization is where you'll take a target vector, something like a one hot vector; (Example 1,0,0.. where 0th index is cat, 1st index is dog and 2nd index is fish.) Then you will compare it against the model's guess with Euclidean distance or some other form of accuracy measurement. (Example 0.5,0,0 is the guess and 1,0,0 is the answer. The distance being 0.5) every algorithm and loss function is different.
You'll take this distance and you'll, say, well hey all I have to do to get this answer next time is adjust all the parameters that led to that answer and I'll always classify this input as a cat.
As you have rightly stated, this requires a target vector and that is the label. As you again rightly stated, this is not the way the brain works. Even if a handful of pixels were off I could get a completely different answer and infact that is the basis of adversarial attacks.
What you are incorrect about, is your statement that probabilities need labels and brains don't use probabilities. There are a couple of biologically inspired methods but the type I study are sparse distributed codes. But to your first incorrect statement, probabilities need populations not labels. labels are just easy ways of forcing models to express the probabilities the way we want to see them. Every sound wave has an infinite number of patterns that can be created from slices of it, ranging from the whole sound to tiny portions of it. Every pattern has a probability of occurring again, whether it has a definitive label for what that pattern is, doesn't matter. The pattern exists with or without a label. This is what the brain takes advantage of.
The brain first discretizes patterns into finite ranges. The cones and in your eyes, the stereocilia in your ear, or the taste buds on your tongue, they all convert infinites into discretes. Even though there are infinite portions of light or sound between two arbitrary points, your body evolved to sense specific points on the spectrum at specific intensities. You can think of it as breaking up the infinite definition of the rainbow into 256 color resolution. In this way, your brain can reliably pick out patterns without having to be exact.
It's a lot more complicated, and to be frank technically only theories exist, but essentially your brain saves exactly what it sees. A pattern comes in, it saves it. Your neurons reach out and strengthen on nerves that fire or give a signal. That is your neurons essentially saving information. The interesting part is that they now have created a pattern (code) of neurons that fire. So, other neurons will save that code, but not just to make a new code for a new codes sake. This new code will be based on signals from combining other senses in the body. So in fact, the code that fired for the red ball falling to the ground will be different for the green ball falling to the ground. They will be similar codes, but the color code from your vision will make it a distinct code.
So what do these codes have to do with probabilities. Let's take the ball falling example. Not only are there codes for what a thing looks and smells or feels like. There are codes made from the last time a neuron fired. So in fact, if a ball is suspended in the air, and the next frame of a movie you see it has gone down a little bit, then the next frame a little bit more. We'll shit, what's the probability it continues to go down. The reason your brain can predict this is because there are a general set of codes you developed early in life that light up for falling objects. Regularly and reliably. This is proven with experimental data. You can look up the Bill Clinton neuron. A code similar enough to all falling objects that if you've seen one falling object you can predict closely enough other falling objects.
Their study though can seem to suggest that one face is saved in a single neuron. But that is a bit misleading leading because you'll actually have a whole field of neurons lighting up for a face. We can only test a few.
There are other consequences of sparse distributed codes. I really didn't mention the other features of sparsity or distribution, but you can look them up your self. The sparsity alone has its own powerful implications, you can look up Pentti Kanerva for more info on that. This is already too long though.
Regardless, don't confuse probabilities with statistical optimization. I did when I first started out, but I didn't look like an ass when talking to people about it. Good luck.
Shiningc t1_j1soczr wrote
There's still going to be an inherent limitation set in statistics and probabilities. As in, things don't always follow a "trend" or a "pattern". A trend could suddenly change in unexpected and surprising ways.
It could be that things like predicting the trajectory of a ball falling are based on statistics and probabilities, when we use our "intuition". But we can also think about it that would completely change how we would predict the trajectory. For example, we could learn that the wind could affect the trajectory of the ball. Or as in the case of baseball, the pitcher could be using the "slider" throw to make the ball fall a lot faster than when normally thrown. And a person would never even have to ever see the ball being affected by the wind before to predict this. There were never any statistical samples. He can simply think about how the wind would affect the ball. So he predicted the trajectory not based on statistics, but by some kind of a new rule, perhaps one that closely resembles the laws of physics.
Our general thinking isn't necessarily based on statistics and probabilities. And that's why an AGI can't be developed from statistical and probabilistic methods alone.
KaiSix88 t1_j1ttc7j wrote
>So he predicted the trajectory not based on statistics, but by some kind of a new rule, perhaps one that closely resembles the laws of physics.
You actually hit it right on the money. That's where the sparsity comes into play. It is technically probabilistic though.
Imagine this for example. I have 100 neurons, only 3 neurons can turn on at a time. 100 choose 3, that is 161700 combinations. We'll call the code that lights up for a ball falling code G for gravity. We'll also say that your motor cortex will fire if it has at least 2 neurons that are the same.
The odds of any other code has an exact match is 1/161700. Very unlikely that anything other than a falling object overlaps with code G. However, there are noisy partial codes, A B and C (3 choose 2), that can overlap with G.
Because these 100 neurons are representing something similar in nature in the same part of the brain, the full code variants of codes A B and C will have meaningful overlap with G because they were formed from the same inputs. This leaves us 3 * 98 full overlapping codes. 98 variants per noisy partial because 3 neurons need to be on at a time, each partial is missing 1 neuron, and there are 98 other neurons to choose from.
As you may have guessed by now, your windy variants are the other overlapping codes. You can call that set W for windy. Only the codes with overlap of 2. So now you have 295(3*98+1) codes that can activate under falling conditions.
But even then, with that many codes between the full W set and code G, 295/161700 is still less than a percent chance of probability of a random code triggering a gravity related thought. In this scenario we haven't considered temporal codes, but this is enough to illustrate how implicit probabilities can arise out of sparse distributed codes.
If you are actually interested in this field, Kanerva and locality sensitive hashes will be right up your lane.
Viewing a single comment thread. View all comments