ReasonablyBadass t1_iwnbmrx wrote on November 16, 2022 at 10:21 PM

AFAIK most LLMs don't even use one epoch?

TheRealSerdra t1_iwo4w46 wrote on November 17, 2022 at 2:04 AM

Technically aren’t you always doing at least one epoch? You’re doing one pass through of all your data at least, even if that data is less than the amount you theoretically could use

ReasonablyBadass t1_iwoq0ug wrote on November 17, 2022 at 5:08 AM

Not a complete one. GPT-3,I think, didn't complete it's first pass-through

zzzthelastuser t1_iwpi7r5 wrote on November 17, 2022 at 11:26 AM

You could argue GPT-3 was trained on a subset of the available training data, no?

Not completing the first pass-through means the remaining data could be considered as not part of the training data.

ReasonablyBadass t1_iwplk0c wrote on November 17, 2022 at 12:06 PM

Semantics. It didn't see any of it's data more than once and it had more available. Not one full epoch.

zzzthelastuser t1_iwpltkw wrote on November 17, 2022 at 12:09 PM

Sure, but in theory my little Hello World network had also more data available on the internet.

leondz t1_ix96sfz wrote on November 21, 2022 at 6:34 PM

Yeah, this gives you an idea of how little of the data is actually worth going through - most of it repeats structures found elsewhere in the data, and isn't very diverse. Going through huge low-curation datasets is inefficient: the data diversity just isn't there.