Viewing a single comment thread. View all comments

new_name_who_dis_ t1_j71w8up wrote

If I recall correctly, ViT is a purely transformer based architecture. So you don't need to know RNNs or CNNs, just transformers.

7

JustOneAvailableName t1_j71yj42 wrote

Understanding what is extremely easy and rather useless, to understand a paper you need to understand some level of why. If you have time to go in depth, aim to understand the what not and why not.

So I would argue at least some basic knowledge of CNNs is required.

2

SAbdusSamad OP t1_j71z0zp wrote

Well, I do have idea about CNNs. I have limited knowledge of RNNs. But I don't have knowledge of Attention is All You Need.

1

Erosis t1_j72rzdl wrote

You'll probably be fine learning transformers directly, but a better understanding of RNNs might make some of the NLP tutorials/papers containing transformers more easily comprehensible.

Attention is an very important component of transformers, but attention can be applied to RNNs, too.

3

SAbdusSamad OP t1_j759v4v wrote

I agree that having a background in RNNs and attention with RNNs can make the learning process for transformers, and by extension ViT, much easier.

1

tripple13 t1_j723bf0 wrote

I strongly disagree. Having an understanding of seq2seq prior Transformers, goes a long way.

1

new_name_who_dis_ t1_j723k5w wrote

I mean the more you understand the better obviously. But it's not necessary, it's just context for what we don't do anymore.

2