Viewing a single comment thread. View all comments

asarig_ OP t1_j71wbqs wrote

Reply to comment by SatoshiNotMe in [R] Graph Mixer Networks by asarig_

Of course, MLP-Mixers is a new approach first developed as image classification and was developed independently by Google and Oxford researchers in May 2021.

The MLP-Mixer, also known simply as "Mixer", is a type of image architecture that doesn't incorporate convolutions or self-attention. Instead, it relies solely on the use of multi-layer perceptrons (MLPs) that are repeatedly applied either to different spatial locations or feature channels.

Instead of Transformers, which are normally applied on the Graph, in this work, I tried to use Mixers as a new kernel method on graphs, which aims to find out how it performs with linear complexity, avoiding the O(n***^(2)***) complexity of Transformers

5

janpf t1_j75zh5u wrote

Ha, the funny thing is that in the Google paper at least they replace the O(n^(2)) by a O(n*D_S), where D_S is a constant, so linear. But it so happens that D_S > n in their studies, so it's not really faster :) ... (edit: there is another constant in the transformers version also, but effectively the mixer was using same order of magnitute amount of TPU time to train)

But MLP-Mixers are a very interesting proposition anyway. Other types of mixers used are things like FFT (FNet).

3

gdpoc t1_j7337zm wrote

That is fascinating work.

I'd like to read the paper and will, given the time; are the results promising?

It seems reasonable that a graph with a small branching factor could reasonably replicate logarithmic search complexity of the input space to at least some extent; I'm very interested in exploring this space.

2

asarig_ OP t1_j73g1ne wrote

Thanks for your interest. If you open an issue on GitHub about this, I will keep it in mind as a reminder, and I can share pre-trained weights at the appropriate time.

2