G_fucking_G
G_fucking_G t1_jczd46d wrote
Reply to [D]: Vanishing Gradients and Resnets by Blutorangensaft
https://old.reddit.com/r/MachineLearning/comments/px3hzd/d_has_the_resnet_hypothesis_been_debunked/
The advantage of ResNets are most probably not the erasure of vanishing gradients but a smoothing of the loss landscape.
G_fucking_G t1_jc1tmli wrote
Reply to comment by CashyJohn in [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes
On which metric are you basing this on? I'm not deep in ASR but in the Whisper paper it is compared to word2vec 2.0 and whisper is better in most categories.
G_fucking_G t1_jdifa1c wrote
Reply to [P] ChatGPT with GPT-2: A minimum example of aligning language models with RLHF similar to ChatGPT by liyanjia92
Very interesting.
Quick question. How long does training take? For:
I saw you used one 3090Ti, so was it done in hours/days/weeks?