Unlucky_Excitement_2
Unlucky_Excitement_2 t1_jczo8wf wrote
Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.
Unlucky_Excitement_2 t1_jczk2lm wrote
Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?
Unlucky_Excitement_2 t1_jdavhcr wrote
Reply to comment by KerfuffleV2 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
Bro what are you talking about LOL. Its context length he's discussing. There are multiple ways[all of which I'm expertimenting with] ->