Submitted by No_Performer203 t3_y09znb in MachineLearning
I am working on a project where my dataset consists of programs Each program is to be represented as a graph And I want to perform ‘between-graph clustering’ (clustering similar graphs)
So far all the literature I have seen talks about within graph clustering (clustering of similar nodes in a single graph)
Does anyone know of any resources that could help me with my project?
resented_ape t1_irr5f0k wrote
Your problem is really how to measure similarity between graphs. If you can do that, you can use pretty much any standard clustering technique.
This task is common in drug discovery and related cheminformatics fields where small molecules are represented as graphs. One common approach there is to generate a dictionary of subgraphs and then represent each molecule as a binary vector where a 1 means the subgraph is present and 0 otherwise. There is an obvious extension to counts of features (i.e. you have an integer vector with a N if the subgraph appears N times). This strategy crucially depends on how nodes and edges are labeled.
The similarity measure is usually something like Jaccard (often referred to as Tanimoto in the chemistry literature).