new_name_who_dis_

new_name_who_dis_ t1_ixiofup wrote

Yea exactly. If you’re citing a paper you’re implicitly citing all of the papers that paper cited.

No one is citing the original perceptron paper even though pretty much every deep learning paper uses some form of a perceptron. Because the citation is implied going from more complex architectures cited, to simpler ones those cited, and so on until you get to perceptron.

6

new_name_who_dis_ t1_ixie59m wrote

GRU cites LSTM paper so it's fine imo, especially if they're using the GRU architecture and not the LSTM architecture.

Citing the original LSTM paper is kind of dumb in general since the modern LSTM architecture is not the one described in the paper. You really need to cite one of the latter papers that introduced the Forget gate, if you are using the default LSTM implementation.

5

new_name_who_dis_ t1_ixhxgdg wrote

> Schmidhuber literally just wants to be cited when people refer to work that they did.

He has some of the most cited papers in the field. What Schmidhuber wants is to be cited for papers that almost no one read and whose ideas can be vaguely relevant to some of the new breakthrough papers, but only if you really squint.

He's a very good researcher and and has many cool ideas, and it'd be much better if he was actually encouraging people to adopt them the proper way (like by creating demos and easy to use libraries and opensourcing code/weights) -- instead of trying to prove that the already widely adopted techniques are actually special cases of his own techniques.

30

new_name_who_dis_ t1_ivybdq7 wrote

Oh I didn’t know them. Still if it’s only been out a few months for it to be cited it would have needed to be noticed by someone who is writing their next research paper and have that paper already published.

Unless preprints on arxiv count. But even then it takes weeks if not months to do research and write a paper. So that leaves such a small window for possible citations at this point.

2

new_name_who_dis_ t1_ivy2et6 wrote

Getting lots of citations a few month after your paper comes out only happens with papers written by famous researchers. Normal people need to work to get people to notice their research (which is they are sharing it here now).

And usually a paper starts getting citations after it’s already been presented at a conference where you can do the most easiest promotion of it.

13

new_name_who_dis_ t1_ivtav2j wrote

No I’m not saying to discard Bert you still use the Bert as encoder and use mdn like network as a final layer. It could still be a self attention layer just trained with the mdn loss function. MDN isn't a different architecture, it's just a different loss function for your final output that isn't deterministic but allows for multiple outputs

1

new_name_who_dis_ t1_ivq7gwm wrote

To make Beam search work with BERT you'd need to change the way BERT works, which you could do but it's probably too complicated for what you want to do.

What you could do instead is just using a non-determenistic classifier, like a Mixture Density Network. It predicts several outputs as well as their likelihood.

2

new_name_who_dis_ t1_iushhel wrote

Yes. Idk what libraries are popular now but I've used Pytorch Geometric, and it takes as input V which is NxF where N is number of nodes in the graph and F features of the nodes. And E which is 2xnum_edges, where each 2d element is a directed edge between the index of node in V to another index of node in V. E is basically the sparse representation of the adjacency matrix.

18

new_name_who_dis_ t1_iurd1je wrote

It's funny because as a human looking at those board positions I'd potentially also pass and say to my opponent, "come on, those stones are dead, we both know it", and if they disagree we start playing again.

Like in the first game, the only stone that could potentially make life is the bottomest rightmost stone, and even then probably not. All the other stones are unquestionably dead.

3

new_name_who_dis_ t1_iumsvkq wrote

It depends. I use pytorch and mostly it works. But sometimes I see a cool library and download it and try to run it and I get a C level error where it says "unsupported hardware" -- in which case I need to run that code on linux.

I think it should be fine since your laptop should just be a client for doing deep learning, and not the server. So whenever you have problems you can just test on a linux machine.

I've personally never written code myself that throws the unsupported hardware error. So it must be some specialty accelerated code that only works with intel or whatever. But yeah this hasn't been an issue for writing code, only for trying to use other people's code (and even then it's pretty rare, usually only happens when you clone from like nvidia or something).

3