BoiElroy

BoiElroy t1_j9ipbtg wrote

This is not the answer to your question but one intuition I like about universal approximation theorem I thought I'd share is the comparison to a digital image. You use a finite set of pixels, each that can take on a certain set of discrete values. With a 10 x 10 grid of pixels you can draw a crude approximation of a stick figure. With 1000 x 1000 you can capture a blurry but recognizable selfie. Within the finite pixels and the discrete values they can take you can essentially capture anything you can dream of. Every image in every movie ever made. Obviously there are other issues later like does your models operational design domain match the distribution of the training domain or did you just waste a lot of GPU hours lol

3

BoiElroy t1_ixzz4z3 wrote

Honestly lookup Paperspace Gradient and consider their monthly service. They have a tier where you can quite routinely get decent free GPUs, which honestly when you're just working up code and refactoring and making sure a training run is actually going to run then it's perfect for that. Then when you're ready to let something run overnight then you select an or whatever A6000 and it's reasonably priced.

2