new_name_who_dis_ t1_j71w8up wrote on February 3, 2023 at 2:09 PM

If I recall correctly, ViT is a purely transformer based architecture. So you don't need to know RNNs or CNNs, just transformers.

JustOneAvailableName t1_j71yj42 wrote on February 3, 2023 at 2:26 PM

Understanding what is extremely easy and rather useless, to understand a paper you need to understand some level of why. If you have time to go in depth, aim to understand the what not and why not.

So I would argue at least some basic knowledge of CNNs is required.

SAbdusSamad OP t1_j71z0zp wrote on February 3, 2023 at 2:30 PM

Well, I do have idea about CNNs. I have limited knowledge of RNNs. But I don't have knowledge of Attention is All You Need.

Erosis t1_j72rzdl wrote on February 3, 2023 at 5:38 PM

You'll probably be fine learning transformers directly, but a better understanding of RNNs might make some of the NLP tutorials/papers containing transformers more easily comprehensible.

Attention is an very important component of transformers, but attention can be applied to RNNs, too.

SAbdusSamad OP t1_j759v4v wrote on February 4, 2023 at 4:13 AM

I agree that having a background in RNNs and attention with RNNs can make the learning process for transformers, and by extension ViT, much easier.

tripple13 t1_j723bf0 wrote on February 3, 2023 at 3:00 PM

I strongly disagree. Having an understanding of seq2seq prior Transformers, goes a long way.

new_name_who_dis_ t1_j723k5w wrote on February 3, 2023 at 3:01 PM

I mean the more you understand the better obviously. But it's not necessary, it's just context for what we don't do anymore.