Submitted by abhitopia t3_ytbky9 in MachineLearning
Hello,
I am new to this community. I am an ML researcher and a computer scientist. I have been interested in Category theory and functional programming (and Haskell in particular). I am also very interested in brain inspired computation and do not believe that current Deep Learning systems are a way to go.
In recent year, there are a few papers now which suggest how predictive coding can replace backpropagration based systems.
While initial research focussed on MLPs only, recently it have been applied to arbitrary computations graphs including CNNs, LSTMs, etc.
As is typical of ML practitioners, I don't have a neuroscience background. However, I found this amazing tutorial to understand predictive coding and how it can be used for actual computation.
A tutorial on the free-energy framework for modelling perception and learning
To best of my knowledge, no mainstream ML libraries (Pytorch or Tensorflow) currently support predictive coding efficiently.
As such, I am interested in building a highly parallel and extensive framework to do just that. I think a future "artificial brain" will be like a server that is never turned off, and can be scaled up (horizontally or vertically on demand). After reading up, I found Erland is a perfect language for that as it natively supports distributed computed, with millions of small indendent processes that communicate with each other using lightweight IPC.
Digging further, it seems that someone even wrote a 1000 page book Handbook of Neuroevolution Through Erlang . This book was written in 2012 before the advent of deep learning and focussing on evolution techniques (like genetic algorithm).
My proposal is to take these ideas and build a general purpose, highly parallel, scalable arifitical neural network library (with first class support for online/continual learning) using Erlang. I am looking for any feedback or advice here as well as looking for collaborators. So if interested, please reach out!
UPDATE [22-11-2022]: Considering using Rust and Actix library instead for performance reasons.
maizeq t1_iw5mh0v wrote
I will save you a significant amount of wasted time and tell you now that predictive coding (as it has been described more or so for 20 years in the neuroscience literature) is not equivalent to backpropagation in the way that Millidge, Tschantz, Song and co have been suggesting for the last two years.
It is extremely disheartening to see them continue to make this claim when they are clearly using a heavily modified version of predictive coding (called FPA PC, or fixed predicted assumption PC), which is so distinct to PC it is a significant stretch to lend it the same name.
For one predictive coding under the FPA no longer corresponds to MAP estimation on a probabilistic model (gradient descent on the log joint probability), so it loses its interpretation as a variational Bayes algorithm (something that afaik has not been explicitly mentioned by them thus far).
Secondly, if you spend any appreciable time on predictive coding you will realise that the computational complexity of FPA PC is guaranteed to be at best equal to backpropagation (and in most cases significantly worse).
Thirdly, FPA-PC requires "inverted" PC models in order to form this connection with backpropagation. These are models where high dimensional observations (such as images), parameterise latent states - no longer rendering them generative models in the traditional sense.
FPA PC can really be understood as just a dynamic implementation of backprop (with very little actual connection to predictive coding). This implementation of backpropagation is in many ways practically inefficient and meaningless. Let me use an analogy to make this more clear: Let's say you want to assign the variable a to f(x). You could either do a = f(x). Or you could set up a to update based on da/dt = a - f(x). The fixed/convergence point of which results in a = f(x). But if you think about it, if you already have the value 25, this is just a round about method of assigning a.
In the case of backpropagation "a" corresponds to backpropagated errors, and the dynamical update equation corresponds to the recursive equations which defines backpropagation. I.e. we are assigning "a" to the value of dL/dz, for a loss L. (it's a little more than this, but I'm drunk so I'll leave that to you to discern). If you look at the equations more closely you find that it basically can not be any more efficient than backpropagation because the error information still has to propagate backwards, albeit indirectly. I would check out this paper by Robert Rosenbaum which I think is quite fantastic if you want more nitty gritty details, and which deflates a lot of the connections espoused between the two works, particularly from a practical perspective.
I don't mean to be dismissive of the work of Millidge and co! Indeed, I think the original 2017 paper by Whittington and Bogacz was extremely interesting and a true nugget of insight (in terms of how PC with certain variance relationships between layers can approximate backprop etc. - something which makes complete sense when you think about it), but the flurry of subsequent work that has capitalised on this subtle relationship has been (in my honest opinion) very misleading.
Also, I would also not take any of what I've said as a dismissal of predictive coding in general. PC for generative modeling (in the brain) is extremely interesting, and may be promising still.