Submitted by LightGreenSquash t3_yrsqcz in MachineLearning
Hi all,
I'm on the first years of my PhD in Computer Vision and obviously the vast majority of research in it is nowadays using Deep Learning techniques. I like to think that I'm far from an absolute beginner in the sense that:
- I've trained neural networks and more "traditional" ML models in a couple of courses, as well as for my MSc thesis, albeit almost out-of-the-box stuff.
- I have a decent understanding of Linear Algebra, Calculus and Probability Theory (undergrad courses from CS degree). I say "decent" because I'm of the firm opinion that the more math one knows the more impressive the things they can do in AI, so I really don't consider myself a math whiz, but judging from the math knowledge an average "How to get started with Deep Learning" blog post assumes, I'd say I'm well ahead. I'm also devoting some time every day to a more rigorous study of these areas, eventually hoping to expand to other related ones.
- I can get through Deep Learning papers and usually obtain at least a basic understanding of what they're about, as well as why it works, at least according to the authors and their experiments. I do still have some trouble with more state-of-the-art works, especially ones that also use things from NLP.
However, I don't really feel confident that I can actually produce useful research that investigates and/or uses this sort of methods to do something new.
During undergrad, in order to actually understand most -if not all- concepts taught to me in programming and math I'd actually do things with them: solve problems, prove statements, or just code with the goal of creating some system or seeing how an idea actually works (e.g. polymorphism).
I realize, however, that this has not been the case with Deep Learning, at least for me: I've never tried to actually code a CNN or ResNet, much less a word2vec model, a Transformer, or any sort of generative model. Sure, I've read about how the first layers of a CNN learn edges etc. but I've never actually "seen it with my own eyes". Transformers in particular seem to really trouble me. Although I sort-of understand the idea behind attention etc., I struggle to see what sort of features they end up using (in contrast to CNNs, where the idea of learning convolutional filters is much more intuitive to me).
Which brings me to the question of what's an efficient way to go from understanding a paper to actually feeling like you really, truly, "grok" the material and could build on it, or use it in some scenario?
- Do you think implementing research papers from scratch or almost from scratch can be useful? Or is it way too time consuming for someone already busy with a PhD? Is it even feasible or are most papers -sadly- unreproducible if you don't use authors' code?
- How do you manage to stay on track with such a rapidly evolving field, on any level beyond a completely surface understanding?
- How do you find a good balance between learning to use tools/frameworks, reading papers and gaining the deeper sort of understanding I mention?
haljm t1_ivwwvj0 wrote
Find people to talk to! I definitely agree that the problem with learning by doing in ML is that it would probably be several full time jobs.
What I would suggest (and how I try to get by): find someone -- or several someones -- to discuss ideas with. Instead of learning by doing, learn by discussing how you would hypothetically do something, e.g. solve a certain problem or extend a method to do something else. Your advisor is great for this. If your advisor doesn't have time, try other professors, postdocs, more senior PhD students, or basically anyone who will talk to you.
As for keeping track of the field, my opinion is that there's not actually that much truly new stuff in ML, and everything basically builds on the same themes in different ways. Once you're sufficiently familiar with the general themes, the high level is enough to understand new stuff. I basically never read papers beyond the high level, except if I'm considering using it for my research.