new_name_who_dis_ t1_ixiofup wrote on November 23, 2022 at 6:57 PM

Reply to comment by DigThatData in [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank

Yea exactly. If you’re citing a paper you’re implicitly citing all of the papers that paper cited.

No one is citing the original perceptron paper even though pretty much every deep learning paper uses some form of a perceptron. Because the citation is implied going from more complex architectures cited, to simpler ones those cited, and so on until you get to perceptron.

new_name_who_dis_ t1_ixie59m wrote on November 23, 2022 at 5:51 PM

Reply to comment by crouching_dragon_420 in [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank

GRU cites LSTM paper so it's fine imo, especially if they're using the GRU architecture and not the LSTM architecture.

Citing the original LSTM paper is kind of dumb in general since the modern LSTM architecture is not the one described in the paper. You really need to cite one of the latter papers that introduced the Forget gate, if you are using the default LSTM implementation.

new_name_who_dis_ t1_ixhxgdg wrote on November 23, 2022 at 4:01 PM

Reply to comment by ReginaldIII in [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank

> Schmidhuber literally just wants to be cited when people refer to work that they did.

He has some of the most cited papers in the field. What Schmidhuber wants is to be cited for papers that almost no one read and whose ideas can be vaguely relevant to some of the new breakthrough papers, but only if you really squint.

He's a very good researcher and and has many cool ideas, and it'd be much better if he was actually encouraging people to adopt them the proper way (like by creating demos and easy to use libraries and opensourcing code/weights) -- instead of trying to prove that the already widely adopted techniques are actually special cases of his own techniques.

new_name_who_dis_ t1_ix1607r wrote on November 19, 2022 at 11:09 PM

Reply to comment by spruce5637 in [D] NLP folks who have used AllenNLP, how do you migrate your projects to other framework(s)? by spruce5637

Disentangling your model will involve reading AllenNLP’s source code and taking from it what you need. I don’t think there’s an easier way of doing it

new_name_who_dis_ t1_ivybdq7 wrote on November 11, 2022 at 2:56 PM

Reply to comment by terranop in [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru

Oh I didn’t know them. Still if it’s only been out a few months for it to be cited it would have needed to be noticed by someone who is writing their next research paper and have that paper already published.

Unless preprints on arxiv count. But even then it takes weeks if not months to do research and write a paper. So that leaves such a small window for possible citations at this point.

new_name_who_dis_ t1_ivy2et6 wrote on November 11, 2022 at 1:49 PM

Reply to comment by martinkunev in [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru

Getting lots of citations a few month after your paper comes out only happens with papers written by famous researchers. Normal people need to work to get people to notice their research (which is they are sharing it here now).

And usually a paper starts getting citations after it’s already been presented at a conference where you can do the most easiest promotion of it.

new_name_who_dis_ t1_ivtav2j wrote on November 10, 2022 at 1:33 PM

Reply to comment by natural_language_guy in [D] Is there anything like beam search with BERT? by natural_language_guy

No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs

new_name_who_dis_ t1_ivq7gwm wrote on November 9, 2022 at 8:27 PM

Reply to [D] Is there anything like beam search with BERT? by natural_language_guy

To make Beam search work with BERT you'd need to change the way BERT works, which you could do but it's probably too complicated for what you want to do.

What you could do instead is just using a non-determenistic classifier, like a Mixture Density Network. It predicts several outputs as well as their likelihood.

new_name_who_dis_ t1_ivmnnqo wrote on November 9, 2022 at 1:52 AM

Reply to comment by Phoneaccount25732 in [D] Academia: The highest funded plagiarist is also an AI ethicist by [deleted]

Lol that’s actually a relief to hear haha

new_name_who_dis_ t1_ivmnhxr wrote on November 9, 2022 at 1:51 AM

Reply to comment by drcopus in [D] Academia: The highest funded plagiarist is also an AI ethicist by [deleted]

I think law is very interdisciplinary. But you still need to understand the actual law.

new_name_who_dis_ t1_ivlwu4d wrote on November 8, 2022 at 10:36 PM

Reply to comment by terminal_object in [D] Academia: The highest funded plagiarist is also an AI ethicist by [deleted]

I don't think I've seen a single AI Ethicist (from the ones i see on twitter at least) who actually has a background in ethics and philosophy.

It's like if I got a job as a lawyer without having gone to law school.

new_name_who_dis_ t1_ivhbys1 wrote on November 7, 2022 at 11:41 PM

Reply to comment by IDefendWaffles in [D] At what tasks are models better than humans given the same amount of data? by billjames1685

That would be like saying GPT was trained on no data just because there’s no labels and annotations.

Alpha zero was trained in an environment with basically infinite data.

new_name_who_dis_ t1_iushhel wrote on November 2, 2022 at 6:17 PM

Reply to [D] Graph neural networks by No_Captain_856

Yes. Idk what libraries are popular now but I've used Pytorch Geometric, and it takes as input V which is NxF where N is number of nodes in the graph and F features of the nodes. And E which is 2xnum_edges, where each 2d element is a directed edge between the index of node in V to another index of node in V. E is basically the sparse representation of the adjacency matrix.

new_name_who_dis_ t1_iurd1je wrote on November 2, 2022 at 1:50 PM

Reply to [N] Adversarial Policies Beat Professional-Level Go AIs by xutw21

It's funny because as a human looking at those board positions I'd potentially also pass and say to my opponent, "come on, those stones are dead, we both know it", and if they disagree we start playing again.

Like in the first game, the only stone that could potentially make life is the bottomest rightmost stone, and even then probably not. All the other stones are unquestionably dead.

new_name_who_dis_ t1_iumsvkq wrote on November 1, 2022 at 3:05 PM

Reply to [D] Machine learning prototyping on Apple silicon? by laprika0

It depends. I use pytorch and mostly it works. But sometimes I see a cool library and download it and try to run it and I get a C level error where it says "unsupported hardware" -- in which case I need to run that code on linux.

I think it should be fine since your laptop should just be a client for doing deep learning, and not the server. So whenever you have problems you can just test on a linux machine.

I've personally never written code myself that throws the unsupported hardware error. So it must be some specialty accelerated code that only works with intel or whatever. But yeah this hasn't been an issue for writing code, only for trying to use other people's code (and even then it's pretty rare, usually only happens when you clone from like nvidia or something).

new_name_who_dis_ t1_iumllqy wrote on November 1, 2022 at 2:14 PM

Reply to [P] Need pretrained EBMs for benchmarks by anomaly_in_testset

I don't know if there is one but since you brought this up you should consider publishing your model as a benchmark once you're done, so that the next person who has this same problem will have a solution as well!

new_name_who_dis_ t1_isy3qr2 wrote on October 19, 2022 at 3:29 PM

Reply to comment by cygn in [R] can diffusion model be used for domain adaptation? by riccardogauss

It's proprietary data so I can't. If you have a public dataset (or I guess 2 for style/domain transfer), I could run my code on it and get back to you.

new_name_who_dis_ t1_ispgfn7 wrote on October 17, 2022 at 6:51 PM

Reply to [R] can diffusion model be used for domain adaptation? by riccardogauss

I'm currently working on a similar problem and my current approach is to add some noise (but not a lot) on the image in domain A, and then denoise with network trained on generating images in domain B.

It's not perfect, but it works. I'd be interested to hear more discussion of this topic.

new_name_who_dis_ t1_irf9n5b wrote on October 7, 2022 at 5:03 PM

Reply to comment by there_are_no_owls in [D] Giving Up on Staying Up to Date and Splitting the Field by beezlebub33

Aren’t there already a lot of specialized conferences? Like cvpr and iccv are computer vision, I know theres an NLP one I don’t remember the name.

RL is the only one that I can’t think of a famous specialized conference for but there probably are ones that just aren’t as famous.

new_name_who_dis_ t1_irepq7z wrote on October 7, 2022 at 2:34 PM

Reply to [D] Giving Up on Staying Up to Date and Splitting the Field by beezlebub33

> I am wondering if ML needs to do what physics did a while ago, and just give up on trying to understand all of it.

I think a lot of people in ML already have been doing that. This doesn’t need to be a widely acknowledged shift. You research what you’re interested in, that’s how you specialize.

new_name_who_dis_ t1_ir2f1oy wrote on October 4, 2022 at 8:52 PM

Reply to [R] The Illustrated Stable Diffusion by jayalammar

When you say that OpenClip can potentially replace the CLIP model, the rest doesn't need to be retrained does it? Is the CLIP model trained jointly with the diffusion Unet and autoencoder?