Seankala t1_jdz6kty wrote on March 28, 2023 at 7:11 AM

Reply to comment by wazis in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Yeah I read through the whole thing and it's not surprising. Train-test contamination has been a problem for a while now.

Seankala t1_jdz64gw wrote on March 28, 2023 at 7:04 AM

Reply to comment by hardmaru in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Thanks!

Seankala t1_jdz53mn wrote on March 28, 2023 at 6:50 AM

Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

It'd be nice to see the qualifications of the authors.

Seankala OP t1_j9nqmf5 wrote on February 23, 2023 at 7:41 AM

Reply to comment by currentscurrents in [D] 14.5M-15M is the smallest number of parameters I could find for current pretrained language models. Are there any that are smaller? by Seankala

That's true for all of the models. I don't really need anything cool though, all I need is a solid model that can perform simple tasks like text classification or NER well.

Seankala OP t1_j9npctd wrote on February 23, 2023 at 7:25 AM

Reply to comment by adt in [D] 14.5M-15M is the smallest number of parameters I could find for current pretrained language models. Are there any that are smaller? by Seankala

Thanks for the detailed answer! My use case is that the company I work at currently uses image-based models for e-commerce purposes, but we want to use text-based models as well. The image-based model(s) are already taking up around 30-50M parameters so I didn't want to just bring in a 100M+ parameter model. Even 15M seems quite big.

Seankala OP t1_j9np9ae wrote on February 23, 2023 at 7:24 AM

Reply to comment by chogall in [D] 14.5M-15M is the smallest number of parameters I could find for current pretrained language models. Are there any that are smaller? by Seankala

I guess at least 100M+ parameters? I like to think of the BERT-base model as being the "starting point" of LLMs.

Seankala t1_j8u8ogp wrote on February 16, 2023 at 11:55 PM

Reply to [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee

I hear my colleagues complain about the same thing. And then go back to doing AutoModel.from_pretrained(sdfsdf).

Seankala t1_j8r2317 wrote on February 16, 2023 at 9:52 AM

Reply to comment by currentscurrents in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie

> ...just the hyperparameter was the optimizer design itself.

Probably one of the best things I've read today lol. Reminds me of when old colleagues of mine would have lists of different PyTorch optimizers and just loop through them.

Seankala t1_j7sssh9 wrote on February 9, 2023 at 3:36 AM

Reply to What are the best resources to stay up to date with latest news ? [D] by [deleted]

Feedly. A great tool that curates news and other blogs/articles/social media via RSS.

Seankala t1_j549ygz wrote on January 20, 2023 at 7:52 AM

Reply to [D] Simple Questions Thread by AutoModerator

Are there any Slack channels or Discord Servers for ML practitioners to talk about stuff?

Seankala t1_j17r2we wrote on December 22, 2022 at 7:43 AM

Reply to [D] Using "duplicates" during training? by DreamyPen

Also make sure to change your random seed for each run and calculate the mean and variance for each runs' performance on the test set. As a principal, you should always set aside a test set that you never touch other than for performance purposes.

Seankala t1_j17qu4r wrote on December 22, 2022 at 7:40 AM

Reply to [D] Hype around LLMs by Ayicikio

Frankly I feel the same way about diffusion models. It's nothing more than "whoa cool!" for me. After doing research in NLP in graduate school and doing NLP in the real world, I'm increasingly feeling a huge disconnect between academic research and the real world.

Seankala t1_iwab2nh wrote on November 14, 2022 at 3:30 AM

Reply to [D] When was the last time you wrote a custom neural net? by cautioushedonist

I'm also in NLP and I usually just use a pretrained model from HuggingFace as a backbone and build on top of that. It's not usually anything complicated though, maybe an MLP.

Seankala t1_ivdqt68 wrote on November 7, 2022 at 5:20 AM

Reply to [D] Git Re-Basin Paper Accused of Misinformation by fryingnem0

I think the word "misinformation" is a little dangerous to be using here. People can criticize the authors of not providing any actual novelty (which is actually super common) but pointing fingers and saying they're "spreading misinformation" is a little much.

Seankala t1_ivdqnvf wrote on November 7, 2022 at 5:19 AM

Reply to comment by apliens in [D] Git Re-Basin Paper Accused of Misinformation by fryingnem0

Nah screw that. Everything should be publicly discussed.

Seankala t1_iuuxb24 wrote on November 3, 2022 at 4:51 AM

Reply to [D] What are the benefits of being a reviewer? by Signal-Mixture-4046

There's not really any merit. That's why the reviewing process is such a mess. There's virtually no way to incentivize or penalize people for being shitty reviewers.

Seankala OP t1_iup00ot wrote on November 1, 2022 at 11:40 PM

Reply to comment by trendymoniker in [D] Is there a way we can score "popularity" on social media posts? by Seankala

Thanks for the comment. I should probably have been a little more descriptive, but we're not really trying to optimize the metrics themselves. This is just one part that's going to be used for a bigger pipeline.

Seankala OP t1_iuozx7h wrote on November 1, 2022 at 11:39 PM

Reply to comment by MustachedSpud in [D] Is there a way we can score "popularity" on social media posts? by Seankala

Thanks! That sounds like something that would be easily computable and reasonable.

Seankala OP t1_iuozuvw wrote on November 1, 2022 at 11:39 PM

Reply to comment by Spirited_Expert64 in [D] Is there a way we can score "popularity" on social media posts? by Seankala

Ah thanks for the comment but I think we won't have to account for sentiment information. I should have probably said "publicity" or something rather than "popularity" (it makes sense in my native language). Negative sentiment would also mean that something is trending, and that's what we're trying to measure rather than how positively people would view something.

Seankala t1_itk390z wrote on October 24, 2022 at 5:56 AM

Reply to comment by invertedpassion in [D] Any pre trained retrieval based language models available? by invertedpassion

Try looking up DensePhrases, it was made by a colleague of mine and may be what you're looking for. They also have an online demo you can try.

I'm not sure what you mean by "retrieval-based language model" though. I don't think there's any language model that's made solely for the purpose of retrieval.

Seankala t1_itjqp2b wrote on October 24, 2022 at 3:42 AM

Reply to [D] Any pre trained retrieval based language models available? by invertedpassion

Would open-domain QA models be relevant for this topic?

Seankala t1_itjqimk wrote on October 24, 2022 at 3:41 AM

Reply to [D] Simple Questions Thread by AutoModerator

Why do we pronounce "ICLR" as "eye-clear" but not "ICML" as "eye-camel?"

Seankala t1_isv7m8s wrote on October 18, 2022 at 11:01 PM

Reply to [D] How frustrating are the ML interviews these days!!! TOP 3% interview joke by Mogady

Lol are you by any chance in Asia? As an Asian who was raised in the US but is living and working in his "mother country," everything about this post screams "Asia" to me.

Seankala t1_isrzhkb wrote on October 18, 2022 at 7:33 AM

Reply to [D] Machine Learning conferences/journals with a mathematical slant? by vajraadhvan

You're not going to find theory-dense papers at major ML conferences. Most of the reviewers don't bother going through them and people usually find theory boring compared to "super duper cool" architectures that lead to 0.1% increase in performance. Like the top comment said, COLT is a good place to start.

Seankala t1_is8sbpn wrote on October 14, 2022 at 3:08 AM

Reply to comment by [deleted] in [D] Manually creating the target data is considered as data leakage. by [deleted]

I normally wouldn't entertain that, but upon checking your profile it seems like you're interested in financial ML. As someone who's briefly done research in financial ML I can tell you that 99% of the papers you read and ML code you run are pretty much BS. You're better off using ML very sparingly and using if-else statements.