janpf t1_j75zh5u wrote on February 4, 2023 at 9:22 AM

Reply to comment by asarig_ in [R] Graph Mixer Networks by asarig_

Ha, the funny thing is that in the Google paper at least they replace the O(n^(2)) by a O(n*D_S), where D_S is a constant, so linear. But it so happens that D_S > n in their studies, so it's not really faster :) ... (edit: there is another constant in the transformers version also, but effectively the mixer was using same order of magnitute amount of TPU time to train)

But MLP-Mixers are a very interesting proposition anyway. Other types of mixers used are things like FFT (FNet).