Craksy t1_jdywiwi wrote on March 28, 2023 at 5:04 AM

Reply to comment by antonivs in [D] FOMO on the rapid pace of LLMs by 00001746

Well that doesn't really contradict previous comment. They did mention fine tuning as an exception. GPT even stands for Generalized Pretrained Transformer. I'm sure some people like to draw hard lines between transfer learning/specialisation/fine tuning (different task or just different data) but at any rate, what you're describing can hardly be considered "training from scratch".

Indeed very few will need to be able to train models on that scale. In fact that was the whole motivation behind GPT. Training LLMs from scratch consumes a tremendous amount of resources, and 99% of that work goes into building a foundation that happens to generalize very well across many different tasks.

kalakau t1_jdzb1jx wrote on March 28, 2023 at 8:16 AM

> Generalized Pretrained Transformer

this is pedantic but it's actually Generative PT

Craksy t1_jdzbgzj wrote on March 28, 2023 at 8:22 AM

Not at all.
While it doesn't mean the world for the point I was trying to make, it does change the meaning quite a bit.

Thank you for the correction

antonivs t1_je1cuw1 wrote on March 28, 2023 at 6:28 PM

My description may have been misleading. They did the pretraining in this case. The training corpus wasn't natural language, it was a large set of executable definitions written in a company DSL, created by customers via a web UI.

Craksy t1_je3tzt3 wrote on March 29, 2023 at 5:28 AM

Aah, got you. My bad. Well, I suppose most people mainly think of NLP in these kind of contexts. That's where my mind went, anyway.

Training from scratch on a DSL is indeed an entirely different scale of problem (assuming it's not some enormous, complex DSL that relies heavily on context and thousands of years of culture to make sense of).

Sounds very interesting though. If you're allowed to share more information, I'd love to hear about it

antonivs t1_je82r3j wrote on March 30, 2023 at 2:14 AM

Well, I do need to be a bit vague. The main DSL has about 50 instructions corresponding to actions to be performed. There's also another different sub-DSL, with about 25 instructions, to represent key features of the domain model, that allows particular scenarios to be defined and then recognized when executing.

Both DSLs are almost entirely linear and declarative, so there's no nested structure, and the only control flow is a conditional branch instruction in the top-level DSL, to support conditional execution and looping. The UI essentially acts as a wizard, so that users don't have to deal with low-level detail.

There are various ideas for the GPT model, including suggesting instructions when creating a program, self-healing when something breaks, and finally generating programs from scratch based on data that we happen to already collect anyway.

NLP will probably end up being part of it as well - for that, we'd probably use the fine-tuning approach with an existing language model as you suggested.

visarga t1_jdzt9gd wrote on March 28, 2023 at 12:08 PM

> Generalized 😓