Unlucky_Excitement_2 t1_jdavhcr wrote on March 23, 2023 at 1:50 AM

Reply to comment by KerfuffleV2 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

Bro what are you talking about LOL. Its context length he's discussing. There are multiple ways[all of which I'm expertimenting with] ->

flash attention
strided context window
finetuning on a dataset with longer sequences

Unlucky_Excitement_2 t1_jczo8wf wrote on March 20, 2023 at 7:48 PM

Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.

Unlucky_Excitement_2 t1_jczk2lm wrote on March 20, 2023 at 7:21 PM

Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?