Viewing a single comment thread. View all comments

turnip_burrito t1_j8byoth wrote

Surpasses humans in this science test, across the board (natural science, social science, language science, etc).

Wow.

And outperforms GPT 3.5 with about 0.4% the parameter amount.

Wonder how it does on other tests?

Would this run on consumer graphics cards, then? Seems like it's in the ballpark to run on a single 3090, but without knowing the total requirements, I can't say.

Edit: "Our experiments are run on 4 NVIDIA Tesla V100 32G GPU" - paper

​

Paper link: https://arxiv.org/pdf/2302.00923.pdf#page=7

85

blueSGL t1_j8c26i1 wrote

> Experimental Settings

> As the Multimodal-CoT task re- quires generating the reasoning chains and leveraging the vision features, we use the T5 encoder-decoder architec- ture (Raffel et al., 2020). Specifically, we adopt UnifiedQA (Khashabi et al., 2020) to initialize our models in the two stages because it achieves the best fine-tuning results in Lu et al. (2022a). To verify the generality of our approach across different LMs, we also employ FLAN-T5 (Chung et al., 2022) as the backbone in Section 6.3. As using im- age captions does not yield significant performance gains in Section 3.3, we did not use the captions. We fine-tune the models up to 20 epochs, with a learning rate of 5e-5. The maximum input sequence length is 512. The batch sizes for the base and large models are 16 and 8,respectively. Our experiments are run on 4 NVIDIA Tesla V100 32G GPUs.

So the GPUs were used in training, there is nothing to say what the system requirements will be for inference.

25

phira t1_j8c3mqx wrote

Hrm. 512 limit on input might explain the performance vs parameters

9

rixtil41 t1_j8ctrij wrote

I wonder how much more efficient these models can get ?

2

WithoutReason1729 t1_j8c97b1 wrote

GPT-2 XL is 1.5 billion parameters. Unless they added some very computationally expensive change to this new model that's unrelated to the parameter count, this could definitely run on consumer hardware. Very very cool!

21

Red-HawkEye t1_j8cd8mp wrote

God damn, Amazon has entered the game.

Just when you think you had seen it all, an atomic bomb like this one gets announced.

This is equivalent to Black freiza villain coming back in Dragon ball and one shotting the main characters.

Amazon GPT one shots ChatGPT and Google LaMBDA out of nowhere.

36

grimorg80 t1_j8ctetu wrote

I want to see it in action out in the open, though.

6

DarkCeldori t1_j8cz4sg wrote

While on the topic of consumer h/w, ryzen ai xdna seems promising, as itll be able to easily make use of main system memory which will soon be able to easily reach 256GB. That can fit very large models and inference is usually far less computationally intensive than training.

3

gangstasadvocate t1_j8d43oc wrote

Apparently one of my professors well, her husband has a home grade computer system with 2 TB of RAM. I tried searching it up. It only seems like server type size but yeah

3

DarkCeldori t1_j8dk62j wrote

I think some threadripper pro workstations can reach up to 2TB of ram. Will be very good once treadrippers come with ryzen xdna ai built in as that can directly use main memory for ai tasks.

1

Tiamatium t1_j8i6ien wrote

Yeah, 2TB ram is doable with server/workstation hardware. Think Threadripper or Xeon for CPU.

1

NapkinsOnMyAnkle t1_j8e4p9m wrote

I've trained CNNs of up to 200m parameters on my laptops 3070 without any issues. I think it's only around 5gb of available VRAM.

This is a big concern of mine; AGI actually requires an insurmountable amount of VRAM to train in realistic timeframes thereby being essentially impossible. I mean, we could calculate all of these models by hand to train and then use to make predictions but it would take forever, like literally forever!

3