Submitted by LightGreenSquash t3_yrsqcz in MachineLearning
Hi all,
I'm on the first years of my PhD in Computer Vision and obviously the vast majority of research in it is nowadays using Deep Learning techniques. I like to think that I'm far from an absolute beginner in the sense that:
- I've trained neural networks and more "traditional" ML models in a couple of courses, as well as for my MSc thesis, albeit almost out-of-the-box stuff.
- I have a decent understanding of Linear Algebra, Calculus and Probability Theory (undergrad courses from CS degree). I say "decent" because I'm of the firm opinion that the more math one knows the more impressive the things they can do in AI, so I really don't consider myself a math whiz, but judging from the math knowledge an average "How to get started with Deep Learning" blog post assumes, I'd say I'm well ahead. I'm also devoting some time every day to a more rigorous study of these areas, eventually hoping to expand to other related ones.
- I can get through Deep Learning papers and usually obtain at least a basic understanding of what they're about, as well as why it works, at least according to the authors and their experiments. I do still have some trouble with more state-of-the-art works, especially ones that also use things from NLP.
However, I don't really feel confident that I can actually produce useful research that investigates and/or uses this sort of methods to do something new.
During undergrad, in order to actually understand most -if not all- concepts taught to me in programming and math I'd actually do things with them: solve problems, prove statements, or just code with the goal of creating some system or seeing how an idea actually works (e.g. polymorphism).
I realize, however, that this has not been the case with Deep Learning, at least for me: I've never tried to actually code a CNN or ResNet, much less a word2vec model, a Transformer, or any sort of generative model. Sure, I've read about how the first layers of a CNN learn edges etc. but I've never actually "seen it with my own eyes". Transformers in particular seem to really trouble me. Although I sort-of understand the idea behind attention etc., I struggle to see what sort of features they end up using (in contrast to CNNs, where the idea of learning convolutional filters is much more intuitive to me).
Which brings me to the question of what's an efficient way to go from understanding a paper to actually feeling like you really, truly, "grok" the material and could build on it, or use it in some scenario?
- Do you think implementing research papers from scratch or almost from scratch can be useful? Or is it way too time consuming for someone already busy with a PhD? Is it even feasible or are most papers -sadly- unreproducible if you don't use authors' code?
- How do you manage to stay on track with such a rapidly evolving field, on any level beyond a completely surface understanding?
- How do you find a good balance between learning to use tools/frameworks, reading papers and gaining the deeper sort of understanding I mention?
manimino t1_ivykzs5 wrote
You should take some of Andrew Ng's courses on Coursera, such as the ConvNets course. He walks through the major architectures in lectures (U-nets, ResNet, etc) and you implement them in homeworks.
Waaaay faster than banging your head against a paper. If you devote all your time for a week, you can get through the whole course. (Bit of a speedrun, you probably want two weeks for full comprehension.)