Hi all,

I'm on the first years of my PhD in Computer Vision and obviously the vast majority of research in it is nowadays using Deep Learning techniques. I like to think that I'm far from an absolute beginner in the sense that:

I've trained neural networks and more "traditional" ML models in a couple of courses, as well as for my MSc thesis, albeit almost out-of-the-box stuff.
I have a decent understanding of Linear Algebra, Calculus and Probability Theory (undergrad courses from CS degree). I say "decent" because I'm of the firm opinion that the more math one knows the more impressive the things they can do in AI, so I really don't consider myself a math whiz, but judging from the math knowledge an average "How to get started with Deep Learning" blog post assumes, I'd say I'm well ahead. I'm also devoting some time every day to a more rigorous study of these areas, eventually hoping to expand to other related ones.
I can get through Deep Learning papers and usually obtain at least a basic understanding of what they're about, as well as why it works, at least according to the authors and their experiments. I do still have some trouble with more state-of-the-art works, especially ones that also use things from NLP.

However, I don't really feel confident that I can actually produce useful research that investigates and/or uses this sort of methods to do something new.

During undergrad, in order to actually understand most -if not all- concepts taught to me in programming and math I'd actually do things with them: solve problems, prove statements, or just code with the goal of creating some system or seeing how an idea actually works (e.g. polymorphism).

I realize, however, that this has not been the case with Deep Learning, at least for me: I've never tried to actually code a CNN or ResNet, much less a word2vec model, a Transformer, or any sort of generative model. Sure, I've read about how the first layers of a CNN learn edges etc. but I've never actually "seen it with my own eyes". Transformers in particular seem to really trouble me. Although I sort-of understand the idea behind attention etc., I struggle to see what sort of features they end up using (in contrast to CNNs, where the idea of learning convolutional filters is much more intuitive to me).

Which brings me to the question of what's an efficient way to go from understanding a paper to actually feeling like you really, truly, "grok" the material and could build on it, or use it in some scenario?

Do you think implementing research papers from scratch or almost from scratch can be useful? Or is it way too time consuming for someone already busy with a PhD? Is it even feasible or are most papers -sadly- unreproducible if you don't use authors' code?
How do you manage to stay on track with such a rapidly evolving field, on any level beyond a completely surface understanding?
How do you find a good balance between learning to use tools/frameworks, reading papers and gaining the deeper sort of understanding I mention?

Comments

manimino t1_ivykzs5 wrote on November 11, 2022 at 4:02 PM

You should take some of Andrew Ng's courses on Coursera, such as the ConvNets course. He walks through the major architectures in lectures (U-nets, ResNet, etc) and you implement them in homeworks.

Waaaay faster than banging your head against a paper. If you devote all your time for a week, you can get through the whole course. (Bit of a speedrun, you probably want two weeks for full comprehension.)

LightGreenSquash OP t1_iwi8nix wrote on November 15, 2022 at 8:50 PM

Thanks, I'll have a look into them. I was mostly under the impression that Ng's courses are more beginner-oriented.

aideeptalk t1_ivzw93r wrote on November 11, 2022 at 9:20 PM

I think having coding experience is exceptionally valuable to understanding.

Pick a paper you like that has code (Papers with Code helps in that search) and is considered important (so you aren't investing time in some less important tangent), with an architecture that is worth digging into (like transformers for some vision application you care about), and that preferably also has a Jupyer notebook walkthrough on the web. Find your own toy dataset or problem to apply it to - keep it pretty simple because the goal is understanding, not moving mountains. Review all the above. Then reimplement a toy version on your own for your own dataset - refer back to the above as needed. Doing toy versions keeps computing requirements manageable - you are looking for positive results that prove it is working and you understood things, not the best results ever. So a toy version is fine. It will likely take a few highly focused days to a week depending on your prior knowledge and programming skills in Python and either Torch or Tensorflow.

Repeat as needed to stay current with the major trends. It gets much faster the second and subsequent times through, especially if you structure your code so you can drop in different models. Consider using Lightning to enhance that flexibility.

BTW, if you are an algorithm person, focus on coding the algorithms - e.g. an updated transformer algorithm as your toy example. If you are an applications person, use the stock libraries and code a toy application.

The balance of time between theory and practice coding depends in large part on your career objectives. There are way too many papers to keep up with so you need a triage strategy. For example (maybe not right for you) review almost all the latest papers on some narrow field related to your PhD (e.g. AI for radiology interpretation of malignancies in chest xrays) and review only significant papers across the broad AI spectrum, perhaps lagging back a year or two so you can see which papers actually were highly significant.

clementiasparrow t1_iw0if2t wrote on November 12, 2022 at 12:06 AM

Try to find a problem (maybe within computer vision) that you care about and where you have labelled data. Then maybe try to implement ( i.e. copy and paste) basic versions of standard architectures and start training. It probably doesn’t perform well to begin with so you start fiddling with regularization and losses and layers and features and it gets better. If you feel a rush, you get the energy to carry on, watch youtube tutorials, take coursera courses and maybe even read papers. You are on the path to develop the practical wisdom that drives research and applications these days. Its all about getting you hands dirty. All those fancy looking papers are not a result of theoretical thoughts and careful planing. Rather, they had some ideas and started coding. It looked promising but it didnt work, but then they fiddled and got some more ideas. And it improved - and at some point, they had something they could publish.

haljm t1_ivwwvj0 wrote on November 11, 2022 at 5:18 AM

Find people to talk to! I definitely agree that the problem with learning by doing in ML is that it would probably be several full time jobs.

What I would suggest (and how I try to get by): find someone -- or several someones -- to discuss ideas with. Instead of learning by doing, learn by discussing how you would hypothetically do something, e.g. solve a certain problem or extend a method to do something else. Your advisor is great for this. If your advisor doesn't have time, try other professors, postdocs, more senior PhD students, or basically anyone who will talk to you.

As for keeping track of the field, my opinion is that there's not actually that much truly new stuff in ML, and everything basically builds on the same themes in different ways. Once you're sufficiently familiar with the general themes, the high level is enough to understand new stuff. I basically never read papers beyond the high level, except if I'm considering using it for my research.

LightGreenSquash OP t1_ivxgmef wrote on November 11, 2022 at 9:40 AM

I think I mostly agree on keeping track of the field, the only thing that's not clear to me is what should be considered "foundational" in an area where most of the exciting things have happened in the last ten years or so.

But don't you think that just discussing ideas ends up hiding a significant part of the complexity of actually getting them to do something? It's true that learning by doing seems rather time-consuming, but wouldn't we consider it strange if someone said that he'd learn about algorithms without trying to implement/use/benchmark them, theorems without solving problems with them, or even coding techniques by simply reading about them?

Then again, I guess you'll inevitably end up actually having to do things in your PhD/job/whatever, but I'm concerned that a lack of "foundational" knowledge and experience can greatly hamper you at some time-critical point of this process.

aozorahime t1_ivx4qvv wrote on November 11, 2022 at 6:51 AM

wow, similar happen to me. but the great mentor is your supervisor. I am not PhD yet a research assistant. I discussed a lot with my supervisor regarding what I am currently doing and studying. He helped me a lot to assist me to stay on track. You probably can join AI/ML communities, sometimes they held paper discussions (LIVE) or discuss anything related to AI/ML on discord, etc. In my case, I have to re-study again the ML foundation (such as probstat and calculus) since I am always get influenced by current models (without understanding a thing), but if you already have those pieces of knowledge,you probably dont need time to understand them thoroughly. Thats what my supervisor suggested me before.

LightGreenSquash OP t1_ivxgu6h wrote on November 11, 2022 at 9:44 AM

Yeah, I think you should definitely have a solid understanding of the basics, otherwise as you said new developments can seem flashy but incomprehensible. Paper discussions and such are definitely useful and a part of the process, but I'm not entirely convinced that the level of understanding you get from them is enough without actually doing some things yourself too.

banmeyoucoward t1_iw6r361 wrote on November 13, 2022 at 11:28 AM

You have to learn by doing, but you can do a surprising amount with small data, which will mean you can implement a paper and learn a whole lot faster since you aren't waiting on training. For example, if all you have is MNIST:

Supervised MLP classifier

Supervised convolutional classifier

Supervised transformer classifier

MLP GAN

Convolutional GAN

Gan regularizers (W-GAN, GAN-GP, etc- https://avg.is.mpg.de/publications/meschedericml2018 is mandatory reading + replicate experiments if you want to work on GANs)

Variational Autoencoder

Vector quantized variational autoencoder (VQVAE)

Diffusion model

Represent MNIST Digits using an MLP that maps pixel x, y -> brightness (Kmart NeRF)

I've done most of these projects (still need to do diffusion and my vqvae implementation doesn't work) and they each take about 2 days to grok the paper, translate to code, and implement on MNIST (~6 hours of coding?) using pytorch and the pytorch documentation + reading the relevant papers. very educational!

LightGreenSquash OP t1_iwi9q1g wrote on November 15, 2022 at 8:57 PM

Yep, that's kind of along the lines I'm thinking as well. The only possible drawback I can see is that for such small datasets even "basic" architectures like MLPs can do well enough and thus you might not be able to see the benefit, say, a ResNet brings.

It's still very much a solid approach though, and I've used it in the past to deepen my knowledge of stuff I already knew, e.g. coding a very basic computational graph framework and then using it to train an MLP on MNIST. It was really cool to see my "hand-made" graph topological sort + fprop/bprop methods written for different functions actually reach 90%+ accuracy.