This is only an issue for insanely large numbers though. GPT-4 already performs a ton of multiplications and additions in every layer of every forward pass. You can overfit a much smaller network for multiplication trained on full numbers as tokens, and a GPT-4 like architecture can learn to multiply full numbers for all practical purposes.
It's true that GPT-4 only does a constant number of operations per input though, and asymptotically, the number of operations required to generate the output will scale by O(n log (n)), where n is proportional to the input length. But this is not why it's failing.
masonw32 t1_jdsyi4v wrote
Reply to comment by ArcticWinterZzZ in Why is maths so hard for LLMs? by RadioFreeAmerika
This is only an issue for insanely large numbers though. GPT-4 already performs a ton of multiplications and additions in every layer of every forward pass. You can overfit a much smaller network for multiplication trained on full numbers as tokens, and a GPT-4 like architecture can learn to multiply full numbers for all practical purposes.
It's true that GPT-4 only does a constant number of operations per input though, and asymptotically, the number of operations required to generate the output will scale by O(n log (n)), where n is proportional to the input length. But this is not why it's failing.