Submitted by Not-Banksy t3_126a1dm in singularity
What are the mechanics behind it? Is it an automated process? Is it possible to create efficiency via larger scale shared computing?
I keep seeing that GPT 5 is currently training and will be done later this year and I’m wondering what that actually looks like in practice.
ActuatorMaterial2846 t1_je8e3lg wrote
So what happens is they compile a dataset. Basically a big dump of data. For large language models, that is mostly text, books, websites, social media comments. Essentially as many written words as possible.
The training is done through whats called a neural network using something called a transformer architecture. Which is a bunch of GPUs (graphics processing units) linked together. What happens in the nueral network whilst training is a bit of a mystery, 'black box' is often a term used as the computational calculations are extremely complex. So not even the researchers understand what happens here exactly.
Once the training is complete, it's compiled into a program, often referred to as a model. These programs can then be refined and tweaked to operate a particular way for public release.
This is a very very simple explanation and I'm sure there's an expert who can explain it better, but in a nutshell that's what happens.