mrpogiface t1_jckmi7d wrote on March 17, 2023 at 2:56 PM

Reply to comment by kittenkrazy in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Definitely, but you'd need to further fine-tune the model to "teach" it to make use of the additional context

mrpogiface t1_j7g03gj wrote on February 6, 2023 at 3:09 PM

Reply to [D] List of Large Language Models to play with. by sinavski

Do we actually know that chatGPT is the full 175B? With codex being 13B and still enormously powerful, and previous instruction tuned models (in the paper) being 6.7B it seems likely that they have it working on a much smaller parameter count

mrpogiface t1_itch4j5 wrote on October 22, 2022 at 4:15 PM

Reply to comment by ggerganov in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov

I am extremely interested! I'm excited to learn from it, thank you :)

mrpogiface t1_it7lvz1 wrote on October 21, 2022 at 2:51 PM

Reply to [D] Accurate blogs on machine learning? by likeamanyfacedgod

https://francisbach.com/ is the best imo, really interesting and deep

mrpogiface t1_is400t9 wrote on October 13, 2022 at 3:15 AM

Reply to comment by Historical_Ad2338 in [D] Wide Attention Is The Way Forward For Transformers by SuchOccasion457

Yeah, I don't think the OP paper did any scaling experiments, so I'm a bit sceptical long term, but it would be awesome for efficiency if it worked out.

Also, it turns out that the scaling laws in the paper you linked weren't quite right either (a la chinchilla) so who knows, maybe there is something that was missed when you move out of the infinite data regime

mrpogiface t1_irz4o45 wrote on October 12, 2022 at 2:51 AM

Reply to [D] Classification with final layer having no activation? by AbIgnorantesBurros

The theoretical justification of having the softmax in the loss is nice. Aside from the numerical stability bit, using the softmax / cross entropy makes sense probabilistically

mrpogiface t1_irz4hc9 wrote on October 12, 2022 at 2:50 AM

Reply to comment by ggerganov in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov

As a complete WASM novice, I'd appreciate you doing it as a learning exercise for me :) But yeah, everything you outlined makes sense.

mrpogiface t1_irwjphh wrote on October 11, 2022 at 4:00 PM

Reply to [P] Pure C/C++ port of OpenAI's Whisper by ggerganov

How much effort would it be to get this running in WASM / the browser?