Seankala
Seankala t1_jdz64gw wrote
Reply to comment by hardmaru in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Thanks!
Seankala t1_jdz53mn wrote
Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
It'd be nice to see the qualifications of the authors.
Seankala OP t1_j9nqmf5 wrote
Reply to comment by currentscurrents in [D] 14.5M-15M is the smallest number of parameters I could find for current pretrained language models. Are there any that are smaller? by Seankala
That's true for all of the models. I don't really need anything cool though, all I need is a solid model that can perform simple tasks like text classification or NER well.
Seankala OP t1_j9npctd wrote
Reply to comment by adt in [D] 14.5M-15M is the smallest number of parameters I could find for current pretrained language models. Are there any that are smaller? by Seankala
Thanks for the detailed answer! My use case is that the company I work at currently uses image-based models for e-commerce purposes, but we want to use text-based models as well. The image-based model(s) are already taking up around 30-50M parameters so I didn't want to just bring in a 100M+ parameter model. Even 15M seems quite big.
Seankala OP t1_j9np9ae wrote
Reply to comment by chogall in [D] 14.5M-15M is the smallest number of parameters I could find for current pretrained language models. Are there any that are smaller? by Seankala
I guess at least 100M+ parameters? I like to think of the BERT-base model as being the "starting point" of LLMs.
Seankala t1_j8u8ogp wrote
I hear my colleagues complain about the same thing. And then go back to doing AutoModel.from_pretrained(sdfsdf)
.
Seankala t1_j8r2317 wrote
Reply to comment by currentscurrents in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
> ...just the hyperparameter was the optimizer design itself.
Probably one of the best things I've read today lol. Reminds me of when old colleagues of mine would have lists of different PyTorch optimizers and just loop through them.
Seankala t1_j7sssh9 wrote
Feedly. A great tool that curates news and other blogs/articles/social media via RSS.
Seankala t1_j549ygz wrote
Reply to [D] Simple Questions Thread by AutoModerator
Are there any Slack channels or Discord Servers for ML practitioners to talk about stuff?
Seankala t1_j17r2we wrote
Reply to [D] Using "duplicates" during training? by DreamyPen
Also make sure to change your random seed for each run and calculate the mean and variance for each runs' performance on the test set. As a principal, you should always set aside a test set that you never touch other than for performance purposes.
Seankala t1_j17qu4r wrote
Reply to [D] Hype around LLMs by Ayicikio
Frankly I feel the same way about diffusion models. It's nothing more than "whoa cool!" for me. After doing research in NLP in graduate school and doing NLP in the real world, I'm increasingly feeling a huge disconnect between academic research and the real world.
Seankala t1_iwab2nh wrote
I'm also in NLP and I usually just use a pretrained model from HuggingFace as a backbone and build on top of that. It's not usually anything complicated though, maybe an MLP.
Seankala t1_ivdqt68 wrote
I think the word "misinformation" is a little dangerous to be using here. People can criticize the authors of not providing any actual novelty (which is actually super common) but pointing fingers and saying they're "spreading misinformation" is a little much.
Seankala t1_ivdqnvf wrote
Reply to comment by apliens in [D] Git Re-Basin Paper Accused of Misinformation by fryingnem0
Nah screw that. Everything should be publicly discussed.
Seankala t1_iuuxb24 wrote
There's not really any merit. That's why the reviewing process is such a mess. There's virtually no way to incentivize or penalize people for being shitty reviewers.
Seankala OP t1_iup00ot wrote
Reply to comment by trendymoniker in [D] Is there a way we can score "popularity" on social media posts? by Seankala
Thanks for the comment. I should probably have been a little more descriptive, but we're not really trying to optimize the metrics themselves. This is just one part that's going to be used for a bigger pipeline.
Seankala OP t1_iuozx7h wrote
Reply to comment by MustachedSpud in [D] Is there a way we can score "popularity" on social media posts? by Seankala
Thanks! That sounds like something that would be easily computable and reasonable.
Seankala OP t1_iuozuvw wrote
Reply to comment by Spirited_Expert64 in [D] Is there a way we can score "popularity" on social media posts? by Seankala
Ah thanks for the comment but I think we won't have to account for sentiment information. I should have probably said "publicity" or something rather than "popularity" (it makes sense in my native language). Negative sentiment would also mean that something is trending, and that's what we're trying to measure rather than how positively people would view something.
Seankala t1_itk390z wrote
Reply to comment by invertedpassion in [D] Any pre trained retrieval based language models available? by invertedpassion
Try looking up DensePhrases, it was made by a colleague of mine and may be what you're looking for. They also have an online demo you can try.
I'm not sure what you mean by "retrieval-based language model" though. I don't think there's any language model that's made solely for the purpose of retrieval.
Seankala t1_itjqp2b wrote
Would open-domain QA models be relevant for this topic?
Seankala t1_itjqimk wrote
Reply to [D] Simple Questions Thread by AutoModerator
Why do we pronounce "ICLR" as "eye-clear" but not "ICML" as "eye-camel?"
Seankala t1_isv7m8s wrote
Lol are you by any chance in Asia? As an Asian who was raised in the US but is living and working in his "mother country," everything about this post screams "Asia" to me.
Seankala t1_isrzhkb wrote
You're not going to find theory-dense papers at major ML conferences. Most of the reviewers don't bother going through them and people usually find theory boring compared to "super duper cool" architectures that lead to 0.1% increase in performance. Like the top comment said, COLT is a good place to start.
Seankala t1_is8sbpn wrote
Reply to comment by [deleted] in [D] Manually creating the target data is considered as data leakage. by [deleted]
I normally wouldn't entertain that, but upon checking your profile it seems like you're interested in financial ML. As someone who's briefly done research in financial ML I can tell you that 99% of the papers you read and ML code you run are pretty much BS. You're better off using ML very sparingly and using if-else statements.
Seankala t1_jdz6kty wrote
Reply to comment by wazis in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Yeah I read through the whole thing and it's not surprising. Train-test contamination has been a problem for a while now.