sanderbaduk
sanderbaduk t1_jc9o4hm wrote
Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr
Training for what? Classification, embedding, generation?
sanderbaduk t1_iw7ctql wrote
Mostly custom heads and custom loss functions, more than being concerned with the base model. Implementing theatter is increasingly like implementing your own sort function or something: a learning exercise.
sanderbaduk t1_irv1lo8 wrote
For classification, you get the same answer taking the argmax of logits vs the argmax of probabilities. For training, combining the soft max or sigmoid with a loss function can be more numerically stable.
sanderbaduk t1_iqvmins wrote
Reply to comment by Imaginary_Carrot4092 in [D] Model not learning data by Imaginary_Carrot4092
You have a single input and single output? It's likely to just learn something like the smoothed average, which is quite reasonable.
Also it seems your post is better in the subreddits mentioned under rule 4.
sanderbaduk t1_iqvi17b wrote
- Aggressive marketing
- High pricing (per-user and such nonsense) which is often not clearly indicated on a website
- Tools that fall over with a ton of bugs and errors when you do anything other than the demo script
sanderbaduk t1_jcjy47g wrote
Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
​
Does it have trouble stopping? I see it ramble on, e.g. https://imgur.com/a/e6k7pSP