Viewing a single comment thread. View all comments

martianunlimited t1_j9sh43x wrote on February 24, 2023 at 5:58 AM

Not exactly what you are asking, but there is this paper on scaling law that states that (assuming that the training data is representative of the distribution) for at least large langauge models, how the performance of transformers scale to the amount of data and compare it to other network architecture.... https://arxiv.org/pdf/2001.08361.pdf we don't have anything similar for other types of data.