SOTA LLMs are getting too big, and not even available. For individual researchers who want to try different pre-training strategies/architecture and potentially publish meaningful research, what would be the best way to proceed? Any smaller model suitable for this? (and yet that people would take the result seriously.)

Comments

You must log in or register to comment.

asdfzzz2 t1_jdzbuav wrote on March 28, 2023 at 8:28 AM

https://arxiv.org/abs/2212.14034 might be a good starting point.

kkimdev OP t1_jdzhwp1 wrote on March 28, 2023 at 9:58 AM

This paper covers exactly what I was looking for, thanks!

Nezarah t1_jdz1zqc wrote on March 28, 2023 at 6:09 AM

For specifically personal use and research? And not commercial? LlaMA is a good place to start, and/or Alpaca 7B. Small scale (can run on most hardware locally), can be Lora trained and fine-tuned. Also has High token limits (I think it’s 2000 or so?).

Can have outputs comparable to GPT3 which can be further enhanced with Pre-Context training.

Can add branching functionality through the Langchain library.

andreichiffa t1_jdzcbln wrote on March 28, 2023 at 8:35 AM

Depends on which hardware you have. A rule of thumb is that if you want to be efficient, you need about 3x the model size in VRAM to store optimizers state, plus some headroom for data.

You also need to use float for training, due to stability issues. So unless your GPU supports float8, double the RAM.

Realistically, if you have an RTX 4090, you can go up to 6-7B models (Bloom-6B, GPT-j, …). Anything below, and I would aim at 2.7B models (GPT-neo).

I would avoid LLaMA family due to how you get access to pretrained model weights, for liability, and stay with FOSS. In the latter case you can contribute back and gain some visibility this way, assuming you want some.

[deleted] t1_je26wrk wrote on March 28, 2023 at 9:35 PM

[removed]

Eaklony t1_je3dqa5 wrote on March 29, 2023 at 2:51 AM

I am doing the same thing as you. I am currently playing with gpt2 since it’s extremely small. Then when I am comfortable I plan to play with gptj or other ~7b models. Then finally I kinda want to try something with a 20b model as a final big project maybe since I saw you can fine tune it on 4090.