sanderbaduk t1_jcjy47g wrote on March 17, 2023 at 11:38 AM

Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

Does it have trouble stopping? I see it ramble on, e.g. https://imgur.com/a/e6k7pSP

sanderbaduk t1_jc9o4hm wrote on March 15, 2023 at 8:09 AM

Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr

Training for what? Classification, embedding, generation?

sanderbaduk t1_iw7ctql wrote on November 13, 2022 at 3:05 PM

Reply to [D] When was the last time you wrote a custom neural net? by cautioushedonist

Mostly custom heads and custom loss functions, more than being concerned with the base model. Implementing theatter is increasingly like implementing your own sort function or something: a learning exercise.

sanderbaduk t1_irv1lo8 wrote on October 11, 2022 at 6:53 AM

Reply to [D] Classification with final layer having no activation? by AbIgnorantesBurros

For classification, you get the same answer taking the argmax of logits vs the argmax of probabilities. For training, combining the soft max or sigmoid with a loss function can be more numerically stable.

sanderbaduk t1_iqvmins wrote on October 3, 2022 at 1:01 PM

Reply to comment by Imaginary_Carrot4092 in [D] Model not learning data by Imaginary_Carrot4092

You have a single input and single output? It's likely to just learn something like the smoothed average, which is quite reasonable.
Also it seems your post is better in the subreddits mentioned under rule 4.

sanderbaduk t1_iqvi17b wrote on October 3, 2022 at 12:21 PM

Reply to [D] What do people dislike the most about modern MLOps startups? by Impressive_Ad4945

Aggressive marketing
High pricing (per-user and such nonsense) which is often not clearly indicated on a website
Tools that fall over with a ton of bugs and errors when you do anything other than the demo script