Author: Xifeng Yan
Sampled data is included in code link
Rule and pattern mining
gSpan: Graph-Based Substructure Pattern Mining
Author: (c) copyright 2002, 2003, Xifeng Yan, University of
Illinois at Urbana-Champaign
Reference: "X. Yan and J. Han, gSpan: Graph-Based Substructure
Pattern Mining, Proc. 2002 Int. Conf. on Data Mining (ICDM'02),
Maebashi,Japan, Dec. 2002." and its corresponding Technical Report
UIUCDCS-R-2002-2296 in CS, UIUC.
gspan filename -sFrequency [-o] [-d]
Parameters: Frequency is an integer.
Output frequent graph patterns. The output is located in "filename.fp".
The output is in the form of DFS codes.
The default output format is the same as the input format unless
this option is used.
This version of gSpan only accepts graphs with maximum 254 edges and 254
gspan Chemical_340 -s34
which asks gSpan to discover all frequent subgraphs whose frequency is
at least 34. Here, the input file is "Chemical_340".
The input format can be inferred from the file of Chemical_340:
"t # N" means the Nth graph,
"v M L" means that the Mth vertex in this graph has label L,
"e P Q L" means that there is an edge connecting the Pth vertex with the
Qth vertex. The edge has label L.
gSpan outputs the number of frequent subgraphs with at least one edge.
There are two kinds of output formats: Normal Output and DFS Code Output.
(1) Normal output:
t # id * support
vertex-edge list, same as the input format
"id" is an integer.
"support" is the frequency of the graph pattern.
(2) DFS code output:
<id> <support> [DFS code]
"id" is an integer.
"support" is the frequency of the following graph.
"DFS code" has the following fomrat:
(0) 0 (0f1) 3 (0f0) 0 (2f1) 3 (2f0) 0 (4f6) 5 (b0)
(0): "0" is the label of the first vertex.
0 (0f1): the 0th (the second ZERO) vertex has a forward edge with label "0"
(the first ZERO) and the new vertex has label "1";
0 (4f6): the 4th vertex has a forward edge with label "0" and the new
vertex has label "6";
5 (b0): the last vertex in the code (till this edge, please do not count
the vertices after this edge in the code if any) has a backward
edge with label "5" and it connects to 0th vertex;
NOTES: DFS code is introduced in our ICDM'02 paper. Here the code is in a
compact representation format. Each vertex is assigned a subscript
based on its position in a DFS search.
May 1, 2004