KarmaStrikesThrice t1_j9vgzt4 wrote on February 24, 2023 at 9:13 PM

AI is not computationally demanding to run, learning is the part that needs a supercomputer level resources for months and months, but once the neural network is complete, using it is quite simple. How else would chatGPT be able to service 100+ million users at once if each user required a whole gpu resource-wise?

nicuramar t1_j9xvqbs wrote on February 25, 2023 at 10:18 AM

> AI is not computationally demanding to run

ChatGPT kinda is, due to the size of the neural network. But it’s all relative, of course.

KarmaStrikesThrice t1_j9y13vs wrote on February 25, 2023 at 11:35 AM

But is it the size that is limiting or the performance? ChatGPT is definitely too huge for 1gpu (even the A100 server gpus with 80GB of memory), but once you connect enough gpus to have the space available, i bet you the performance is quite fast. It is similar tu human brain, it takes us days, weeks, years to learn something, but we can then access it in a split of a second. The fastest supercomputers today have tens of thousands of gpus, so if chatgpt can have millions of users running it at the same time, one gpu can have hundreds and thousands of users using it.

0382815 t1_j9x7rbi wrote on February 25, 2023 at 5:16 AM

Users per gpu is lower than one, but ChatGPT definitely does not fit on just one gpu. I’m not sure I would call it simple.

ActuatorMaterial2846 t1_j9ydjnf wrote on February 25, 2023 at 1:51 PM

Is this to do with advancements in file compression? I heard Emad Mostaque talk about this regarding stable diffusion.

KarmaStrikesThrice t1_j9zvqll wrote on February 25, 2023 at 8:08 PM

No I meant it more generally. Neural networks dont contain any super complicated math and equations that are difficult to solve, it is a network of simple cells whose inputs are outputs of previous layer of cells and the output is fed to the next layer. Popular example of a cell is Perceptron, which computes a simple linear equation y=Ax+b. The main problem is the size of a network, which can be billions or even trillions of cells in case of chatgpt. But not all cells are always used, based on the input only some cells are active (the same way our brain does not activate cells that learned math when we are asked what is the capital of New York state for example).

So the most computationally difficult part is learning, and then having enough memory to store the whole network into fast memory, the AI doesnt know what you are about to ask it, so the whole network needs to be ready. But once we ask a specific question, like "are cats carnivores?", 99.99...% of cells remain inactive and only those storing information about biology, mammals, cats, food, meat, diets, carnivores, etc. are engaged and produce answer. So extracting the output based on given inputs is much simpler and can be done by personal computers (if our computers had many terabytes/petabytes of RAM and storage, which they dont)

The advanced compression alhorithms reduce the memory required to store the network, but it doesnt really improve performance aside from some minor cache optimizations.