What about saving the dataset into batches as individual files, then use the data loader to load the files as batches for transformers? Keeping the batch size reasonable for the GPU memory.
For any preprocessing/scaling, this could be done on the CPU side and would not consume much memory^
Viewing a single comment thread. View all comments