I’m looking for a recent (last 5 years) paper that introduces a new e.g. objective function, optimiser or model etc that I can try to implement myself in python/torch/keras. I mainly want to do this to learn new ideas and improve my coding skills.

Do you have any recommendations or alternatively any advice for how to find new interesting papers for someone who isn’t a researcher? I’ve looked on arxiv but I haven’t found what I’m looking for.

Comments

You must log in or register to comment.

Small-Reason-8096 t1_irr6g0q wrote on October 10, 2022 at 12:44 PM

Hands down the best paper I have ever read (and reimplemented) is the ResNets paper:

https://arxiv.org/abs/1512.03385

The descriptions are clear and concise - but with enough detail to reimplement in whatever framework you like. Also, OOTB the results I got on CIFAR10 matched the paper pretty much perfectly (not always a given!).

Another good paper to try is AWD-LSTM: https://arxiv.org/pdf/1708.02182.pdf

Basically, if you are implementing and training from scratch, focus on something you can train with a smallish dataset in a reasonable period of time. I would generally steer away from LLMs and object detection / segmentation models as they require more resources to train that are commonly available!

TheInfelicitousDandy t1_irsfw1a wrote on October 10, 2022 at 6:09 PM

I've tried to reimplement AWD-LSTM in pytorch > 1. and have never been able to get close to the original results. I've also seen other people try and not get close. Pretty sure it has to do with the weight dropout they used.

If anyone knows of any pytorch > 1. version that achieves the same PPL on PTB/Wiki02 I'd very much like to know.

Small-Reason-8096 t1_irzvwc8 wrote on October 12, 2022 at 8:01 AM

That surprises me as there was a good Fastai version:

https://docs.fast.ai/text.models.awdlstm.html

which is built on pytorch. When I played with it ages ago the results seemed comparable to the paper, but I haven't revisited it for a while :)

TheInfelicitousDandy t1_is0ajet wrote on October 12, 2022 at 11:22 AM

As far as I know that version doesn't give comparable PPL.

Someone else saying the same https://github.com/salesforce/awd-lstm-lm/issues/86#issuecomment-453266265

A major issue here (and for other reproductions) are people saying they have a reproduction because they can run it without errors but never actually getting the same results.

Der-Schwarzer-Schwan t1_irrvs7r wrote on October 10, 2022 at 3:54 PM

Try NeRF paper. It will show you how to combine a network with an analytical function.

pm_me_your_ensembles t1_irt5y8h wrote on October 10, 2022 at 9:06 PM

Phil Wang/lucidrains has phenomenal implementations of stuff, I'd recommend checking them out and reading their code.

Furthermore, I'd recommend simply reading more code and tackling complex problems, e.g. try building a DL framework from "scratch" ontop of jax. Read the Haiku codebase, and compare it to say Equinox (I am a big fan of this one). Go through the huggingface code bases, e.g. transformers. Choose a model and build it from scratch and make it compatible with their API.

mlvpj t1_iru3rn3 wrote on October 11, 2022 at 1:25 AM

we have implemented a bunch of research papers here, most of which were picked because they were interesting and had something new to learn. you can probably pick any of them and itll be fun to implement

https://github.com/labmlai/annotated_deep_learning_paper_implementations

dasayan05 t1_irrfd7w wrote on October 10, 2022 at 1:59 PM

It doesn't matter which one you implement. Trying to implement anything from scratch always exposes you to deeper insights which is hard to get by looking at dry mathematics on paper. Just one advice: pick a paper/algo that is well-known to work and reproducible. Then you are good.

neuroguy123 t1_irtuo1b wrote on October 11, 2022 at 12:15 AM

I recommend some of the YOLO versions. I had fun with those and learned a lot about complex loss functions.

I also implemented a bunch of attention-models starting with Graves', through Bahdanau and Luong, and then Transformers. The history of attention in deep learning is very interesting and instructional to implement.

Another one I had fun implementing was Wavenet as it really forces you to deep dive on convolution variations, pixel-cnn, and some gated network structures. Then conditioning it was an extra challenge (similar to the attention networks).

One thing I've been meaning to get into is deepcut and other pose models because I don't know much about linear programming and the other math they use in those.

[deleted] t1_irsql7k wrote on October 10, 2022 at 7:21 PM

[removed]