Submitted by sharp7 t3_xu70v4 in MachineLearning
Are there any small problems that I can use to test out transformers?
Something simple and easy to program. So that I can use a small transformer net and gauge its performance on my regular computer. Without the need for nutty training times.
My ultimate goal is to then modify the network and see if I can get similar/better results with some tweaks.
For example for a normal network I do simple function approximation like predict y from x where target y is actually: cos(sin(10*(x^2))^3)
But to test transformers I would probably need a finite sequence -> finite sequence testing function, and I can't think of one easily.
adam_jc t1_iqui2g0 wrote
you can do n-digit addition of positive integers as a sequence where each digit is a token, i.e.
the problem 946 + 82 = 1028 could be made into sequence of:
9 | 4 | 6 | + | 0 | 8 | 2 | = | 1 | 0 | 2 | 8
(you could also omit + and = tokens).
Andrej Karpathy uses this task in his minGPT repo.
edit: also in that repo he does character level training on a tiny dataset of Shakespeare writing