no_witty_username t1_ivffk9p wrote on November 7, 2022 at 4:09 PM

All evidence currently points that it is quite possible even for one person to make a high quality model all by themselves. It will take a great effort in high quality data curation, but I do not see anything that is out of reach. The only reason this field has a perception of a large data set requirement, is because a large amount of data was used to train the base model. But what folks don't seem to understand is that the quantity of data used in training the base model was EXTREMELY poor. Bad captions, bad cropping, redundancies, mis-categorizations, and a plethora of other issues plague the training data. The base SD model could have been trained with orders of magnitude less data, if due diligence was used in data curation.

This is the case for Stable Diffusion. I would not be surprised if this was the case for other models as well.