I am working on a project where my dataset consists of programs Each program is to be represented as a graph And I want to perform ‘between-graph clustering’ (clustering similar graphs)

So far all the literature I have seen talks about within graph clustering (clustering of similar nodes in a single graph)

Does anyone know of any resources that could help me with my project?

Comments

You must log in or register to comment.

resented_ape t1_irr5f0k wrote on October 10, 2022 at 12:35 PM

Your problem is really how to measure similarity between graphs. If you can do that, you can use pretty much any standard clustering technique.

This task is common in drug discovery and related cheminformatics fields where small molecules are represented as graphs. One common approach there is to generate a dictionary of subgraphs and then represent each molecule as a binary vector where a 1 means the subgraph is present and 0 otherwise. There is an obvious extension to counts of features (i.e. you have an integer vector with a N if the subgraph appears N times). This strategy crucially depends on how nodes and edges are labeled.

The similarity measure is usually something like Jaccard (often referred to as Tanimoto in the chemistry literature).

No_Performer203 OP t1_irs5cfd wrote on October 10, 2022 at 4:58 PM

Thank you 🙏🏼

tacixat t1_irqufhl wrote on October 10, 2022 at 10:30 AM

Markov clustering! Built for graphs, does not require a fixed number of clusters.

https://towardsdatascience.com/markov-clustering-algorithm-577168dad475

No_Performer203 OP t1_irr14im wrote on October 10, 2022 at 11:52 AM

Thank you 🙏🏼 , I’ll look it up

ID4gotten t1_irqs2cs wrote on October 10, 2022 at 9:56 AM

I assume you mean a mathematical graph structure and not a plot. Try searching for graph "similarity" instead of "clustering".

No_Performer203 OP t1_irqs76j wrote on October 10, 2022 at 9:58 AM

Yes, thank you for your reply 🙏🏼, I will look that up

felipeasg t1_irtrdlg wrote on October 10, 2022 at 11:48 PM

Clustering basicaly is a tool to help in an exploratoy analysis. There are another aproaches that can help you to investigate graphs.

One approach from NLP (Natural Language Processing) world is called embedding (https://en.m.wikipedia.org/wiki/Word_embedding). You can use some techniques to represent your graph in a vector space like: https://github.com/guoji-fu/Event2vec

This technique is particularly good if your graph is big and has a stochastic nature.

In python you can use this libraries to work with graphs. https://networkx.org/documentation/stable/tutorial.html

And this to sample graphs: https://brandonrozek.com/blog/networkx-random-sample-graph/ https://little-ball-of-fur.readthedocs.io/en/latest/index.html

others references: https://en.m.wikipedia.org/wiki/Graph_embedding https://link.medium.com/rW0beBuz1tb

WikiMobileLinkBot t1_irtrf6p wrote on October 10, 2022 at 11:49 PM

Desktop version of /u/felipeasg's links:

<https://en.wikipedia.org/wiki/Word_embedding>
<https://en.wikipedia.org/wiki/Graph_embedding>

^([)^(opt out)^(]) ^(Beep Boop. Downvote to delete)

wealthyMoss t1_irsglr8 wrote on October 10, 2022 at 6:13 PM

My thought is that if you have a set of graphs and graphs can be represented as adjacency matrices or lists, then essentially you can represent these graphs as a collection of dictionaries or data frames. Then you concatenate these into on giant dictionary or data frame.

For example, say you have 2 graphs each with nodes {a,b,c}. Then you could have a data frame with three columns and three rows for each that consist of the adjacency matrix. Add a new column to each to determine with graph is which (say column named “graph” and for the first graph it would 1 and 2 for second and so one). Then you can concatenate these matrices in a row wise manner to get a 6 row 4 column data frame where the columns are a,b,c, and graph. Then you can use whatever clustering technique you want.

I haven’t tried this but this would be my initial approach. Hope this helps and gives a new way of looking at the data!

I am also curious to see if others think this would be a valid approach as well!

cantfindaname2take t1_is7b59h wrote on October 13, 2022 at 8:38 PM

If your graphs are usually small then you might want to look into Graph Kernels: https://ysig.github.io/GraKeL/0.1a8/documentation/introduction.html

vwings t1_is99t4r wrote on October 14, 2022 at 6:08 AM

Check contrastive pre-training of graph neural networks. The clustering methods should work well for those pretrained representations. Furthermore: do the graphs come from a particular domain? You might find some useful pre-training objectives then..