Submitted by AutoModerator t3_ybjvk5 in MachineLearning
ResponsibleHouse7436 t1_iu00jwz wrote
Hows it going, I am currently trying to train some speech recognition models and doing some research on novel encoder architectures for e2e ASR. However I don't have a ton of compute resources. My final model will be around 300M parameters but I was wondering if training a couple of architectures at say 25-50M params and then scaling the best one is a valid approach to this problem. Why or why not?
Viewing a single comment thread. View all comments