Viewing a single comment thread. View all comments

Disastrous_Elk_6375 t1_j2e6d6d wrote

> 92.98s/it

Are your CPUs fully used when training? You might want to check if this is running on GPU or not, those numbers are generally found on CPU training.

10

Glycerine t1_j2frh3o wrote

You're right it's poor. All 8 CPU's hit 100%.


As an update though:

I made a bunch of changes and reduces the dataset to 5 lines from wikipedia; reduced the PaLM size to about 25% of the original, and reduced the epoch times to 8.

It's phenomenal. Within < 30 minutes and a bunch of poking it can easily generate sensible sentences.


I dropped it onto lambda GPU A100 instance - it's silly fast

Edit:

As an example; I trained the model on 5 sentences, with a optimal length of ~128 chars. I ask for a word and see what it constructs.

The goal here is to see if it produces sensible sentences from real words:

With a known word the response is fairly stable:

 qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example', 20)
'example, wrote of vi'
&gt;&gt;&gt; qu('example', 10)
'example, w'
&gt;&gt;&gt; qu('example', 50)
'example, wrote of violence as a necessary and some'

untrained words produce some interesting results. Prior to the <100 epochs of training it was saying nonsense:

tensor(0.0431, grad_fn=&lt;NllLoss2DBackward0&gt;)
&gt;&gt;&gt; qu('when')
'whent he wher a arevo-pociaty on indiviolent resis'
&gt;&gt;&gt; qu('when')
'whent he refuted Nechaev).  Other anarchists, some'
&gt;&gt;&gt; qu('but')
'but. how a free society might be brought about.  H'
&gt;&gt;&gt; qu('but')
'but.  The there is also ofowerat; there is no [[co'
5

Disastrous_Elk_6375 t1_j2ft5zo wrote

> You're right it's poor. All 8 CPU's hit 100%.

Yeah, you're probably not using the gpu. Make sure that your pytorch & cuda stuff are compatible and properly installed. To test, go into a python session, and do


torch.cuda.is_available()

If the output is false it will train on CPU.

7