Comments

You must log in or register to comment.

IntelArtiGen t1_j7uce7z wrote

The CPU bottleneck depends on the model and the training process. If you remove all /most of the preprocessing done on CPU it could be fine. I think transformers don't usually bottleneck on CPU but i7 7700k is quite old.

6

Available_Lion_652 OP t1_j7ue7pj wrote

My motherboard is quite old and the best CPU that I can attach yo it is a i7 7700k. From what I have read, if I will process the dataset before training, than it should not bottleneck. But what I was think was that the preprocessed dataset is held in 32 GB of RAM. The CPU transfers data from RAM to GPU memory. It has only 8 threads. Let s say I want to train from scratch a GPT2. I do not know exactly how much the CPU/RAM frequency will bottleneck the training process. I fon t want to change my whole hardware. If 3090 RTX is to performant and the bottleneck is to high, I was wondering if I can buy a 3060/3080

1

pommedeterresautee t1_j7uwa71 wrote

At start the weights will be moved on the GPU. Then during training, the tokenizer will convert your strings to a int64 tensors. They are quite light, and those are moved to GPU during training. What you need is not the fastest CPU but one which can feed your GPU faster that the data it will consume. In GPT2 case, CPU like 7700 won't be an issue. Image or sounds (TTS, ASR) may have more demanding preprocessing during training.

5

JustOneAvailableName t1_j7v99gd wrote

If the model is sufficiently large (if not, you don't really need to wait long anyways) and no expensive CPU pre/postprocessing is done, the 3090 will be the bottleneck.

A single 3090 might not have enough memory to train GPT 2 large, but it's probably close.

Fully training a LLM on a single 3090 is impossible, but you could finetune one.

3

ggf31416 t1_j7waxlu wrote

It will depend on how much preprocessing and augmentation is needed. I don't think text needs much preprocessing or augmentation, but for example image classification or detection training needs to create a different augmented image on each iteration and will benefit from a more powerful processor.

Note that you can also use cloud services. If you aren't dealing with confidential data vast ai often is one of the cheapest, otherwise you can use Lambda Labs, Google Engine, AWS or other services. At least in the case of Google Engine and AWS you have to request access to GPU instances, which may take some time.

2

YOLOBOT666 t1_j7wrm1z wrote

What about saving the dataset into batches as individual files, then use the data loader to load the files as batches for transformers? Keeping the batch size reasonable for the GPU memory.

For any preprocessing/scaling, this could be done on the CPU side and would not consume much memory^

2

SnooHesitations8849 t1_j7y2wu2 wrote

You will be a little bit bottle neck or close to bottle neck. Just buy the damn thing and work. If it bottle neck, just chill or buy some old machine like amd 2000 series they have more cores and cheap

2

ehlen t1_j80bfkq wrote

I have this exact setup (7700k & 3090). If you want me to try something out, I am happy to try running it.

2

Available_Lion_652 OP t1_j80lcuv wrote

I would really appreciate it if you can try to finetune a T5-XXL Flan model from Huggingface on your hardware. I am curious if it works and if there is a big bottleneck. Thank you

1