GLEMOS: Benchmark for Instantaneous
Graph Learning Model Selection

Namyong Park\(^1\)1, Ryan Rossi\(^2\), Xing Wang\(^1\), Antoine Simoulin\(^1\),
Nesreen Ahmed\(^3\), Christos Faloutsos\(^4\)
\(^1\)Meta AI \(^2\)Adobe Research \(^3\)Intel Labs \(^4\)Carnegie Mellon University


Abstract

The choice of a graph learning (GL) model (i.e., a GL algorithm and its hyperparameter settings) has a significant impact on the performance of downstream tasks. However, selecting the right GL model becomes increasingly difficult and time consuming as more and more GL models are developed. Accordingly, it is of great significance and practical value to equip users of GL with the ability to perform a near-instantaneous selection of an effective GL model without manual intervention. Despite the recent attempts to tackle this important problem, there has been no comprehensive benchmark environment to evaluate the performance of GL model selection methods. To bridge this gap, we present GLEMOS in this work, a comprehensive benchmark for instantaneous GL model selection that makes the following contributions. (i) GLEMOS provides extensive benchmark data for fundamental GL tasks, i.e., link prediction and node classification, including the performances of 366 models on 457 graphs on these tasks. (ii) GLEMOS designs multiple evaluation settings, and assesses how effectively representative model selection techniques perform in these different settings. (iii) GLEMOS is designed to be easily extended with new models, new graphs, and new performance records. (iv) Based on the experimental results, we discuss the limitations of existing approaches and highlight future research directions. To promote research on this significant problem, we make the benchmark data and code publicly available at https://github.com/facebookresearch/glemos.

1 Introduction↩︎

Graph learning (GL) methods [1], [2] have achieved great success across multiple domains and applications that involve graph-structured data [3][10]. At the same time, previous studies [11][13] have shown that there is no universally good GL model that performs best across

Figure 1: Via instantaneous graph learning model selection, the best model can be found without performing computationally expensive model training and evaluations.

all graphs and graph learning tasks. Therefore, to effectively employ GL given a wide array of available models, it is important to select the right GL model (i.e., a GL algorithm and its hyperparameter settings) that will perform well for the given graph data and GL task.

Ideally, we would want to be able to select the best GL model for the given graph near-instantaneously, that is, without having to train or evaluate different models multiple times on the new graph, since even a few such training and evaluations might take a considerable amount of time and resources (). Enabling an instantaneous model selection for a completely new graph involves addressing several technical challenges, which includes modeling how well different GL methods perform on various graphs, and establishing a connection between the new graph and observed graphs, such that the best model for the new graph can be estimated in light of observed model performances on similar graphs.

Problem Formulation. With these considerations, we formally define this important problem, which we call Instantaneous Graph Learning Model Selection, and the related terms as follows.

Graph: Let \(G=(V, E, X, Y)\) be a graph where \(V \subseteq \mathbb{N}\) and \(E=\{ (i, j) \mid i, j \in V \}\) denote the sets of nodes and edges, respectively; \(X\) denotes the input features, which can be node features (\(X \in \mathbb{R}^{|V| \times F_N}\)), edge features (\(X \in \mathbb{R}^{|E| \times F_E}\)), or a set of both, where \(F_N\) and \(F_E\) denote the dimension of corresponding input features; and \(Y\) denotes node labels (\(Y \in \mathbb{N}^{|V|}\)) or edge labels (\(Y \in \mathbb{N}^{|E|}\)). Note that input features \(X\) and labels \(Y\) are considered optional since not all graphs have this information.

Model: A model \(M\) refers to a GL method for the given GL task, such as link prediction, with specific hyperparameter settings. In general, a GL model consists of two components, namely, (graph embedding method, hyperparameters) and (predictor, hyperparameters), where the former produces a vector representation of the graph (e.g., node embeddings) and the latter makes task-specific predictions (e.g., link prediction) given the embeddings. The set \(\mathbcal{M}\) of models, from which the model selection is made, is normally heterogeneous, where the configuration of each model is unique in the choice of its two components and their hyperparameter settings.

Performance Matrix: Let \(\boldsymbol{\mathrm{P}}\in \mathbb{R}^{n \times m}\) be a matrix containing observed model performances, where \(P_{ij}\) is the performance (e.g., accuracy) of model \(j\) on graph \(i\). \(\boldsymbol{\mathrm{P}}\) can be sparse with missing entries.

Problem 1 (Instantaneous Graph Learning Model Selection).  

Given

  • a training meta-corpus of \(n\) graphs \(\mathbcal{G} = \{G_i\}_{i=1}^{n}\) and \(m\) models \(\mathbcal{M} = \{M_j\}_{j=1}^{m}\) for a GL task (e.g., link prediction and node classification):

    1. performance matrices \(\{\boldsymbol{\mathrm{P}}_k\}_{k=1}^{\ell}\), i.e., \(\ell\) records of \(m\) models’ performance on \(n\) graphs

    2. input features of the graphs in \(\mathbcal{G}\) (if available)

    3. configurations (i.e., a GL method and its hyperparameter settings) of \(m\) models in \(\mathbcal{M}\)

  • an unseen test graph \(G_{\rm test}\notin \mathbcal{G}\)

Select

  • the best model \(M^{*} \in \mathbcal{M}\) for \(G_{\rm test}\) without* training or evaluating any model in \(\mathbcal{M}\) on \(G_{\rm test}\).*

Status Quo and Our Contributions. In recent years, several methods have been developed for an efficient selection of GL models. However, most of them cannot tackle Prob. 1 as they require multiple rounds of model training and evaluations; we review these methods in Sec. 2. Most recently, a subset of Prob. 1 was studied by MetaGL [13], which proposed a GL model selection technique that assumes plain graphs without input features, and operates without utilizing model configurations. A few recent works [12], [14], [15] also provide performances of graph neural networks (GNNs), although they cannot address Prob. 1. While the datasets used in [12][15] are available, they fall short of being a comprehensive benchmark environment to study this significant problem due to the following reasons.

  • Limited GL Task and Data. Focusing on link prediction, MetaGL [13] only provides link prediction performances, and does not support other widely-used tasks, such as node classification, which limits follow-up studies and use of the benchmark for different GL tasks. Also, other related works [12], [14], [15] are limited in terms of the number and diversity of graphs they cover ().

  • Limited Evaluation Settings. Some important evaluation settings were not considered in MetaGL’s benchmark, such as out-of-domain and small-to-large settings as we later describe, which can be useful in evaluating the performance of model selection techniques in different practical settings.

  • Limited Extensibility. The sets of models and graphs are assumed to be fixed, and it is not easy to extend the benchmark with new graphs and models in a consistent and reproducible manner.

In this work, we address these limitations by developing a comprehensive benchmark for instantaneous graph learning model selection. Overall, the contributions of this work are as follows.

  • Extensive Benchmark Data with Multiple GL Tasks. We construct a benchmark dataset that includes the performances of 366 models on 457 graphs over fundamental GL tasks, i.e., link prediction and node classification, which is by far the largest benchmark for Prob. 1 to our knowledge. The benchmark also provides meta-graph features to capture the structural characteristics of graphs.

  • Comprehensive Evaluation Testbeds. We evaluate ten representative methods for , including both classical methods and deep learning-based ones, using multiple evaluation settings designed to assess the quality of model selection techniques from practical perspectives.

  • Extensible Open Source Benchmark Environment. Our benchmark is designed to be easily extended with new models, new graphs, and new performance records. To promote further research on this significant problem, we make the benchmark environment publicly available.

  • Future Research Directions. We discuss the limitations of existing model selection methods, and highlight future research directions towards an instantaneous selection of graph learning models.

After reviewing related work in , we present the proposed benchmark data and testbeds in . Then we provide experimental results in , and conclude in .

2 Related Work↩︎

2.1 Model Selection↩︎

Model selection refers to the process of selecting a learning algorithm and its hyperparameter settings. In this section, we review existing model selection approaches, which we divide into two groups depending on whether they require model evaluations (i.e., performance queries for the new dataset).

Evaluation-Based Model Selection: Most existing approaches to select machine learning models belong to this group, ranging from simple solutions, such as random search [16] and grid search [17], to more advanced and efficient ones that employ techniques such as adaptive resource allocation [18], early stopping [19], and Bayesian optimization [20][22]. Inspired by these advancements, several model selection methods were recently developed for graph learning (GL) models. To tackle challenges involved with GL model selection, these methods adapt existing ideas to GL models, such as reinforcement learning [23][25], evolutionary algorithm [26], Bayesian optimization [27], and hypernets [28], as well as developing techniques specific to graph data, e.g., subgraph sampling [27] and graph coarsening [29]. Note that all of the above approaches cannot tackle the instantaneous GL model selection problem () as they rely on multiple model evaluations for performance queries of different combinations of GL methods and hyperparameter settings on the new dataset.

Figure 2: GLEMOS provides a comprehensive benchmark environment, covering the steps required to achieve effective instantaneous GL model selection, with multiple options for major building blocks.

Instantaneous Model Selection: To select the best model without querying model performances on the new dataset, methods in this category typically utilize prior model performances or characteristic features of a dataset (i.e., meta-features). A simple approach [30] finds the globally best model (i.e., the one with the overall best performance over all observed datasets), and thus its model selection is independent of query datasets. This can be refined by narrowing the search scope to similar datasets, where dataset similarities are modeled in the meta-feature space, e.g., using \(k\)-nearest neighbors [31] or clustering [32]. Another line of methods [13], [33], [34] take a different approach, which aims to predict the model performance on the given dataset by learning a function that maps meta-features into estimated model performances. Due to their ability to learn such a function in a data-driven manner, this second group of methods generally outperformed the first group in previous studies [13], [34]. While the above methods are one of the first efforts to achieve instantaneous GL model selection, several open challenges remain to be solved, as we discuss in .

0.5em

2pt

Table 1: Comparison of GLEMOS with previous works providing performances of GNN models.
Testbeds
Selection Methods
Features
Models
Datasets
(max # nodes)
Domains
GNN-Bank-101 [14] GNNs 12 34k 5
NAS-Bench-Graph [15] GNNs 9 170k 4
GraphGym [12] GNNs 32 34k 7
GLEMOS (Ours)
(e.g., node2vec, label prop.) 457 496k 37

2.2 Benchmarks for Instantaneous Graph Learning Model Selection↩︎

A few recent works [12], [14], [15] address problems related to GL model selection, and provide performances of GNNs on different datasets. However, all of them perform evaluation-based model selection discussed above, which requires multiple rounds of model evaluations given a new dataset.

As the first benchmark for instantaneous GL model selection (Prob. 1), GLEMOS provides more than just a collection of performance records, i.e., (1) benchmark testbeds and (2) existing algorithms () for instantaneous model selection, as well as different sets of (3) meta-graph features (). These features (1)-(3) are not provided by these previous works [12], [14], [15]. Furthermore, GLEMOS provides more comprehensive and diverse performance records than these works in several aspects (e.g., in terms of included GL models and graph data distributions), as summarized in .

The two major components for instantaneous GL model selection are historical model performances of the GL task of interest (e.g., accuracy for node classification), and meta-graph features to quantify graph similarities. For each component, GLEMOS provides several options to choose from. Once these components are chosen, users select a model selection algorithm, as well as a benchmark testbed to perform evaluation, out of several options available in GLEMOS. Fig. 2 summarizes these steps to use GLEMOS. In the next sections, we describe what GLEMOS provides for these steps.

0.5em

6pt

Table 2: Summary statistics of the GLEMOS benchmark.
Node Classification Task Link Prediction Task
(# model performances on benchmark graphs) 41,856 152,070
Total graphs () 128 457
\(~\mathbin{\vcenter{\rule{.5ex}{.5ex}}}\) Num nodes 34–421,961 34–495,957
\(~\mathbin{\vcenter{\rule{.5ex}{.5ex}}}\) Num edges 156–7,045,181 156–7,045,181
\(~\mathbin{\vcenter{\rule{.5ex}{.5ex}}}\) Num node feats 2–61,278 2–61,278
\(~\mathbin{\vcenter{\rule{.5ex}{.5ex}}}\) Num node classes 2–195 n/a
\(~\mathbin{\vcenter{\rule{.5ex}{.5ex}}}\) Num graph domains 25 37
Total GL models () 327 350
Total meta-graph features () 58–1,074 58–1,074
Total model selection methods () 10 10
Total benchmark testbeds () 5 5

3 Graph Learning Tasks and Performance Collection↩︎

Prior model performances play an essential role in instantaneous GL model selection algorithms, as they can estimate a candidate GL model’s performance on the new graph based on its observed performances on similar graphs. GLEMOS provides performance collections for two fundamental graph learning tasks, i.e., node classification and link prediction. Below we discuss how the graphs and models are selected, and describe how model performances are evaluated for each GL task.

3.1 Graphs and Models↩︎

Graphs. Our principle of selecting the graphs in GLEMOS is to include diverse graph datasets, in terms of both the size and domain of the graph. The size of selected graphs ranges from a few hundred edges to millions of edges, and the graph set covers various domains, e.g., co-purchase networks, protein networks, citation graphs, and road networks. As listed in , the resulting graph set outperforms existing data banks in terms of the number and size of graphs, as well as the diversity of data domain. shows the summary statistics of graphs, and the graph list is given in .

Models. Our principle for selecting the models to include in GLEMOS is to cover representative and widely-used GL methods. We include graph neural network methods (e.g., GCN [35], GAT [36], and SGC [37]), random walk-based node embeddings (e.g., node2vec [38]), self-supervised graph representation learning methods (e.g., DGI [39]), and classical methods (e.g., spectral embedding [40]).
The resulting model set is more diverse than previous works, which considered GNNs alone ().

3.2 Node Classification↩︎

Graph Set. A subset of the graphs have node labels. Excluding the graphs without node labels, the node classification graph set is comprised of 128 graphs from 25 domains.

Model Set. Most methods in GLEMOS are applicable for both node classification and link prediction. In addition to these common methods, we also include label propagation [41], which can be used for node classification. The GL models evaluated for node classification and their hyperparameter settings are listed in . In total, 327 models comprise our model set for node classification.

Performance Collection. For node classification, supervised models are optimized to produce the class distribution. For unsupervised models, we first train them to produce latent node embeddings based on their own objective, and apply a trainable linear transform to transform embeddings into the class distribution. More details on the experimental settings are given in . To evaluate performance, we calculate multiple classification metrics, including accuracy, F1 score, average precision, and ROC AUC score. Given the graph set \(\mathbcal{G}\) and model set \(\mathbcal{M}\) described above, we construct the performance matrix \(\boldsymbol{\mathrm{P}}\) by evaluating every model \(M_j \in \mathbcal{M}\) on every graph \(G_i \in \mathbcal{G}\)i.e., \[\begin{align} P_{ij} = \text{Performance (\emph{e.g.}, accuracy, and ROC AUC) of model } M_j \in \mathbcal{M} \text{ on graph } G_i \in \mathbcal{G}. \end{align}\] Splitting: We generate the train-validation-test node splits with a ratio of 64%-16%-20%, respectively, and train each model applying validation-based early stopping. For reproducibility, we release all data splits, such that future model evaluations can be done using the same node splits.

Graph Set. As link prediction task does not require node labels for evaluation, we greatly expand the graph set used for node classification by adding 329 more graphs. With these graphs, the link prediction graph set consists of 457 graphs from 37 domains. in Appendix gives the full list.

Model Set. All models used for node classification are used for link prediction, except for label propagation, which requires node labels. We also add models designed for link prediction, e.g., SEAL [42] and Adamic/Adar [43]. In total, 350 models comprise the link prediction model set ().

Performance Collection. For link prediction, GL models are optimized to produce latent node embeddings, and we apply a dot product scoring between the two node embeddings, followed by a sigmoid function, to obtain the link probability between the corresponding nodes. We calculate multiple evaluation metrics to measure the link prediction performance, including average precision, ROC AUC score, and NDCG (normalized discounted cumulative gain) [44].

Splitting: We randomly split edges into train-validation-test sets, with a ratio of 64%-16%-20%, which form positive edge sets. For positive edges, we randomly select the same amount of negative edges (i.e., nonexistent edges), which form negative edge sets. Again, we release all edge splits.

0.25em 1.0pt

Table 3: Graph learning methods and their hyperparameter settings that comprise the model set \(\mathbcal{M}\). NC: Applicable for node classification. LP: Applicable for link prediction.
Method NC LP Hyperparameter Settings Count
GCN [35] act \(a \in \{\text{relu},\text{tanh},\text{elu}\}\), dropout \(d \in \{0.0,0.5\}\), hidden channels \(h \in \{16,64\}\), num layers \(\ell \in \{1,2,3\}\) 30
GraphSAGE [45] act \(a \in \{\text{relu},\text{tanh}\}\), aggr \(g \in \{\text{mean},\text{max}\}\), hidden channels \(h \in \{16,64\}\), jumping knowledge \(j \in \{\text{none},\text{last}\}\), num layers \(\ell \in \{1,2\}\) 24
GAT [36] concat \(c \in \{\text{true},\text{false}\}\), dropout \(d \in \{0.0,0.5\}\), heads \(n \in \{1,4\}\), hidden channels \(h \in \{16,64\}\), num layers \(\ell \in \{1,2,3\}\) 40
GIN [46] eps \(e \in \{0.0\}\), hidden channels \(h \in \{16,64\}\), num layers \(\ell \in \{1,2,3\}\), train eps \(t \in \{\text{true},\text{false}\}\) 10
EGC [47] aggregators \(a \in \{\text{[sum]},\text{[mean]},\text{[symnorm]},\text{[min]},\text{[max]},\text{[var]},\text{[std]}\}\), hidden channels \(h \in \{16,64\}\), num bases \(b \in \{4,8\}\), num layers \(\ell \in \{2\}\) 28
SGC [37] bias \(b \in \{\text{true},\text{false}\}\), num hops \(k \in \{1,2,3,4,5\}\) 10
ChebNet [48] Chebyshev filter size \(k \in \{1,2,3\}\), hidden channels \(h \in \{16,64\}\), normalization \(r \in \{\text{none},\text{sym},\text{rw}\}\), num layers \(\ell \in \{1,2\}\) 27
PNA [49] aggregators \(a \in \{\text{[sum]},\text{[mean]},\text{[max]},\text{[var]}\}\), hidden channels \(h \in \{16\}\), num layers \(\ell \in \{1,2\}\), scalers \(s \in \{\text{[identity]},\text{[amplification]},\text{[attenuation]},\text{[linear]}\}\), towers \(t \in \{1\}\) 32
Spectral Emb. [40] num components \(h \in \{16,64\}\), tolerance \(t \in \{0.1,0.01,0.001,0.0001\}\) 8
GraRep [50] num components \(h \in \{16,32,64\}\), power \(p \in \{1,2\}\) 6
DGI [39] encoder act \(a \in \{\text{prelu},\text{relu},\text{tanh}\}\), hidden channels \(h \in \{16,64\}\), summary \(s \in \{\text{mean},\text{max},\text{min},\text{var}\}\) 24
node2vec [38] context size \(w \in \{5,10\}\), hidden channels \(h \in \{16,64\}\), \(p \in \{1,2,4\}\), \(q \in \{1,2,4\}\), walk length \(l \in \{10,20\}\) 72
Label Prop. [41] alpha \(\alpha \in \{0.99,0.9,0.8,0.7\}\), num layers \(\ell \in \{1,2,3,4\}\) 16
Jaccard’s Coeff. [43] - 1
Resource Alloc. [51] - 1
Adamic/Adar [43] - 1
SEAL [42] GNN conv \(c \in \{\text{GCN},\text{SAGE},\text{GAT}\}\), GNN hidden channels \(g \in \{16,64,128\}\), \(k \in \{0.6,0.1\}\), MLP hidden channels \(m \in \{32,128\}\), num hops \(n \in \{1\}\) 36
366

4 Collection of Meta-Graph Features↩︎

Instantaneous model selection algorithms carry over historical model performances on various graphs to estimate how GL models would perform on a new graph. In that process, performance transfer can be done more effectively when we consider graph similarities, such that the performance transfer would be done adaptively based on the similarities between graphs. Structural meta-graph features provide an effective way to that end by summarizing a graph into a fixed-size feature vector in terms of its structural characteristics. GLEMOS provides various meta-graph features, which can capture important graph structural properties. Below we first discuss how GLEMOS generates fixed-length meta-graph features, as depicted in , and then describe the structural features included in GLEMOS, which are organized into three sets for convenience.

Figure 3: Meta-graph features summarize structural graph characteristics into a fixed-size feature vector.

Feature Generation. GLEMOS produces meta-features in two steps, following an earlier work [13].

\(\circ\)Step 1—Structural Feature Extraction: A structural meta-feature extractor \(\psi_k\) is a function that transforms an input graph \(G\) into a vector, which represents a distribution of structural features over nodes or edges. For example, degree and PageRank scores of all nodes correspond to node-level feature distributions, and triangle frequency for each edge corresponds to an edge-level feature distribution. In general, we apply a set of such extractors \(\Psi = \{ \psi_1, \ldots, \psi_K \}\) to the graph \(G\), obtaining a set \(\Psi(G) = \{\psi_1(G), \ldots, \psi_K(G) \}\) of multiple feature distributions.

\(\circ\)Step 2—Statistical Feature Summarization: Since the number of nodes or edges in each graph determines the size of the output from the meta-feature extractors \(\psi_k(G)\), those structural feature distributions cannot be directly used to compare graphs with different number of nodes or edges. Step 2 addresses this issue via statistical feature summarization, which applies a set \(\Sigma\) of statistical functions (e.g., mean, entropy, skewness, etc) that summarize feature distributions \(\psi_k(G)\) of varying size into fixed-length feature vectors; i.e., \(\dim(\Sigma(\psi_k(G_i))) = \dim (\Sigma(\psi_k(G_j)))\) for two graphs \(G_i\) and \(G_j\). By combining all \(K\) summaries, graph \(G\)’s meta-graph feature \(\boldsymbol{\mathrm{m}}\) is obtained to be \(\boldsymbol{\mathrm{m}}= [\Sigma(\psi_1(G)); \cdots; \Sigma(\psi_K(G)) ]\). lists the statistical functions \(\Sigma\) used in GLEMOS.

Collection of Meta-Graph Features. Different graph features may capture different structural properties. Thus, GLEMOS aims to provide representative and diverse graph features, which have been proven effective in earlier studies, while making it easy to work with any set of features. For the convenience of the users, we group the currently supported features into the following three sets: Mregular includes widely used features that capture structural characteristics of a graph at both node and graph levels; Mgraphlets considers features based on the frequency of graphlets, as they can provide additional information; Mcompact is intended to use the least space, while providing several important features that capture node-, edge-, and graph-level characteristics. The details of each set are as follows.

Mregular: This set includes 318 meta-graph features. We derive the distribution of node degrees, k-core numbers, PageRank scores, along with the distribution of 3-node paths (wedges) and 3-node cliques (triangles). Given these five distributions, we summarize each using the set of 63 statistical functions \(\Sigma\), giving us a total of 315 features. We include three additional features based on the density of graph \(G\) and the density of the symmetrized graph, along with the assortativity coefficient.

Mgraphlets: This set includes 756 meta-graph features. First, we derive the frequency of all 3 and 4-node graphlet orbits per edge in \(G\). Next, we summarize each of the 12 graphlet orbit frequency distributions using the set of 63 statistical functions \(\Sigma\), giving us a total of 756 meta-graph features.

Mcompact: This set consists of 58 total meta-graph features, including 9 simple statistics such as number of nodes and edges, density of the graph \(G\), max vertex degree, average degree, assortativity coefficient, and the maximum k-core number of \(G\), along with the mean and median k-core number of a vertex in \(G\). We also include the global clustering coefficient, total triangles, as well as the mean and median number of triangles centered at an edge. We further include the total 4-cliques as well as the mean and median number of 4-cliques centered at an edge. Besides the above 16 features, we also compute the frequency of all 3 and 4-node graphlet orbits per edge, and from these 12 frequency distributions, we derive the mean, median, and max. We also derive the graphlet frequency distribution from the counts of all six 4-node graphlets and include those values directly as features.

Note that the framework is flexible, and users can choose to use any set of features, either a subset of the current features (e.g., to further improve efficiency and use less space), or their superset (e.g., to capture distinct structural characteristics using different features in addressing new tasks).

5 Benchmark Testbeds and Algorithms↩︎

5.1 Benchmark Testbeds↩︎

GLEMOS provides multiple benchmark testbeds (i.e., evaluation settings and tasks) designed to assess model selection performance in different usage scenarios. We describe them in detail below.

Fully-Observed Testbed. In this setup, model selection algorithms are provided with a full performance matrix \(\boldsymbol{\mathrm{P}}\) for the given graph learning task, i.e., without any missing entry in \(\boldsymbol{\mathrm{P}}\). Accordingly, this testbed measures model selection performance in the most information-rich setting, where all models in the model set \(\mathbcal{M}\) have been evaluated on all observed graphs.

Splitting: We apply a stratified 5-fold cross validation, i.e., graphs are split into five folds, which are (approximately) of the same size, and balanced in terms of graph domains, and then as each fold (20%) is held out to be used for testing, the other folds (80%) are used for model training. Note that graph splits are used to split the performance matrix \(\boldsymbol{\mathrm{P}}\) and meta-graph features \(\boldsymbol{\mathrm{M}}\).

Sparse Testbed. The performance matrix \(\boldsymbol{\mathrm{P}}\) in this setting is sparse and partially observed, i.e., we may only have a few observations for each graph. This setting is important since it can be costly to add a new model to the benchmark, which requires training and evaluating the model multiple times on the graphs in the benchmark. By dealing with model selection using a sparse \(\boldsymbol{\mathrm{P}}\), this testbed addresses significant practical considerations, e.g., making it more cost-effective to be able to add new models to the benchmark. Using this testbed, researchers can develop and test specialized algorithms capable of learning from such partially-observed performances. To construct a sparse performance matrix \(\boldsymbol{\mathrm{P}}^{\prime}\), we sample uniformly at random \(pm\) values from each row of \(\boldsymbol{\mathrm{P}}\), where \(p\) is the fraction of values to sample and \(m\) is the total number of models. This graph-wise sampling strategy ensures the same number of observations for each graph, which matches the practical motivation that we have a limited budget per graph. For this benchmark, we use different sparsity levels \(p \in \{0.1,0.3,0.5,0.7,0.9\}\).

Splitting: We use the same stratified 5-fold cross validation as in Fully-Observed testbed. Algorithms are trained using a sparse split of \(\boldsymbol{\mathrm{P}}\), and evaluated with fully-observed performances of the test split.

Out-Of-Domain Testbed. Graphs from a particular domain (e.g., road graphs, social networks, and brain networks) often have similar characteristics to each other. In other words, graphs from a certain domain can be considered as having its own distribution, which makes model selection for graphs from a new domain a challenging task. This testbed evaluates the effectiveness of model selection methods for such an out-of-distribution setting by holding out graphs from a specific network domain, and trying to predict for the held-out domain by learning from graphs from all the other domains.

Splitting: We use a group-based 5-fold cross validation for this testbed such that each domain appears once in the test set across all folds.

Small-To-Large Testbed. Training a GL model can take a lot of time and resources, especially for large-scale graphs. While model selection methods may benefit from having more prior performances, having to obtain performance records for large graphs presents a significant computational bottleneck. The meta-training process can be made significantly faster by enabling model selection algorithms to learn from relatively small graphs to be able to predict for larger graphs. This testbed focuses on this challenging yet practical setting, which evaluates the ability to generalize from small to large graphs.

Splitting: Graphs with less than \(\epsilon\) nodes form a small-graph set used for training. The other graphs with at least \(\epsilon\) nodes form a large-graph set, which is used for evaluation. We evaluate using a threshold value \(\epsilon\) of 10000 for this testbed.

Cross-Task Testbed. The above testbeds operate on the model performances measured for one specific type of GL task. By contrast, in this testbed, model selection methods learn from performances of one GL task (e.g., node classification), and are evaluated by predicting performances of a different GL task (e.g., link prediction). This task present an additional challenge to model the relation between two different, yet related GL tasks, and utilize the learned knowledge for transferable model selection.

Splitting: We first choose the source and target tasks, and split the graphs into the two sets, i.e., the source task set and the target task set. Then the graphs in the source set are used for training, and the graphs in the target set are used for testing.

5.2 Model Selection Algorithms↩︎

GLEMOS provides state-of-the art algorithms for instantaneous model selection, which are listed in . These algorithms are selected such that the benchmark covers representative techniques for model selection, in terms of whether they use meta-graph features (C1, ) and prior model performances (C2, ), and whether they are optimizable with trainable parameters (C3). Random Selection (RandSel) is used as a baseline to see how well model selection algorithms perform in comparison to random scoring. Global Best (GB)-AvgPerf and GB-AvgRank select a model that performed globally well on average. In contrast, ISAC [32] and ARGOSMART(AS) [31] perform model selection more locally with respect to the given graph, using meta-features. As GB methods rely only on prior performance, comparisons against them can help with investigating the effectiveness of meta-graph features. Supervised Surrogates () [33], ALORS [52], NCF [53], MetaOD [34], and MetaGL [13] are optimizable algorithms, which learn to estimate model performance by capturing the relation between meta-features and observed performances. In comparison to the simpler, non-optimizable algorithms above, we can investigate the advantages of different optimization components for instantaneous model selection. provides a more detailed description of each algorithm.

0.5em

2pt

Table 4: GLEMOS provides representative algorithms for instantaneous model selection. Algorithm characteristics denote whether they utilize meta-graph features (C1) and observed model performances (C2) for model selection, and whether they are optimizable (i.e., have trainable parameters) (C3).
Selection
Perf [34]
Rank [13]
[32]
[31]
(S2) [33]
[52]
[53]
[34]
[13]
C1. Use meta-features
C2. Use prior performances
C3. Optimizable

6 Experiments↩︎

In this section, we report how model selection methods perform in different testbeds. Based on those observations, we discuss the limitations of existing methods and future research directions.

6.1 Model Selection Performance↩︎

Evaluation Protocol. To measure how well model selection methods perform on the testbeds presented in , we evaluate their top-1 prediction results (i.e., the model predicted to be the best for the query graph) as model selection aims to find the best performing model as accurately as possible. Specifically, top-1 prediction performance is measured in terms of AUC, MAP (mean average precision), and NDCG (normalized discounted cumulative gain), all of which range from zero to one, with larger values indicating a better performance. We apply AUC and MAP by treating the task as a binary classification problem, in which the top-1 model is labeled as one, and all other models are labeled as zero. For NDCG, we report NDCG@1, which evaluates the ranking quality of the top-1 model. We evaluate these metrics multiple times for the data splits each testbed provides, and report the averaged performance. For reproducibility, GLEMOS provides the data splits of all testbeds.

-0.2em

3pt

1.0 -0.5em

Perf. Metric
AUC (\(\uparrow\)) 0.524 0.735 0.730 0.807 0.864 0.809 0.843 0.728 0.764 0.875
MRR (\(\uparrow\)) 0.016 0.087 0.064 0.134 0.371 0.198 0.201 0.073 0.096 0.295
NDCG@1 (\(\uparrow\)) 0.813 0.942 0.934 0.944 0.957 0.950 0.961 0.943 0.937 0.969

1.0 -0.5em

Table 5: Fully-Observed testbed results for link prediction (top) and node classification (bottom) tasks. Higher (\(\uparrow\)) scores are better. The best result is in bold, and the second best result is underlined.
Perf. Metric
AUC (\(\uparrow\)) 0.518 0.747 0.744 0.746 0.762 0.772 0.734 0.745 0.581 0.740
MRR (\(\uparrow\)) 0.029 0.102 0.124 0.118 0.181 0.110 0.103 0.124 0.041 0.129
NDCG@1 (\(\uparrow\)) 0.747 0.865 0.860 0.885 0.892 0.916 0.886 0.883 0.839 0.863

-0.3em

3pt

1.0 -0.5em

Perf. Metric Sparsity
AUC (\(\uparrow\)) 10% 0.524 0.733 0.732 0.804 0.829 0.813 0.831 0.735 0.743 0.865
30% 0.524 0.728 0.738 0.798 0.763 0.811 0.827 0.739 0.703 0.871
50% 0.524 0.704 0.730 0.790 0.690 0.839 0.814 0.739 0.669 0.866
70% 0.524 0.708 0.730 0.778 0.618 0.814 0.795 0.757 0.630 0.866
90% 0.524 0.717 0.732 0.720 0.547 0.464 0.687 0.656 0.599 0.811

1.0 -0.5em

Table 6: Sparse testbed results for link prediction (top) and node classification (bottom) tasks. Higher (\(\uparrow\)) scores are better. The best result is in bold, and the second best result is underlined.
Perf. Metric Sparsity
AUC (\(\uparrow\)) 10% 0.518 0.746 0.744 0.744 0.748 0.766 0.727 0.727 0.575 0.761
30% 0.518 0.743 0.738 0.734 0.680 0.769 0.741 0.735 0.533 0.736
50% 0.518 0.726 0.739 0.687 0.592 0.739 0.730 0.713 0.485 0.709
70% 0.518 0.692 0.738 0.653 0.571 0.684 0.694 0.709 0.483 0.662
90% 0.518 0.626 0.697 0.592 0.535 0.620 0.654 0.660 0.490 0.659

Fully-Observed Testbed (). Comparison between methods where meta-graph features are either used (e.g., AS, MetaGL) or not used (e.g., GB-Perf) shows the benefits of utilizing meta-graph features for GL model selection. While optimizable methods (e.g., NCF, MetaOD) have the additional flexibility to adaptively tune their behavior based on data, they are outperformed by relatively simple methods like ISAC and AS. At the same time, the best results on link prediction in the majority of metrics are achieved by another optimizable method, MetaGL, which shows the promising potential of optimizable framework for model selection. In node classification results, the performance decrease of optimizable methods are notable (e.g., MetaGL). One potential reason for this is that the graph set for node classification is relatively small compared to the graphs applicable for link prediction, which limits the effectiveness of optimizable algorithms that are more prone to overfitting in such cases.

Sparse Testbed (). As the sparsity of the performance matrix \(\boldsymbol{\mathrm{P}}\) increases, model selection methods perform increasingly worse. In particular, while AS achieves the best or the second best results in the Fully-Observed testbed, its performance quickly declines as sparsity increases. Since AS performs model selection based on the most similar observed graph, it cannot operate effectively in a highly sparse setting. Global averaging methods (e.g., GB-Perf), or more sophisticated optimizable methods show more stable results. Due to the additional requirement for node labels, node classification task in this setup presents the most data sparse, yet practically important regime.

Out-Of-Domain Testbed (). Graphs in the same or similar domains are often more similar to each other than graphs in different domains. As this testbed requires addressing additional challenges to achieve out-of-distribution generalization, most methods perform worse than in other in-distribution testbeds. For instance, AS, which are sensitive to the availability of observed graphs similar to the query graph, perform worse than in . On the other hand, optimizable methods show more promising results as they learn to extrapolate into new domains by learning from observed domains.

Small-To-Large Testbed (). In comparison to the Fully-Observed testbed, the performance decreases overall in this testbed. However, considering that methods learn only from small graphs, model selection for large graphs still performs quite well, often achieving a similar level of performance. Successful methods in this testbed can make the model selection pipeline much more efficient as performance collection for small graphs can be done much more efficiently than for large graphs.

Additional Results. We provide additional results in , including the results of the Cross-Task testbed, and results obtained with other meta-graph features, e.g., Mgraphlets and Mcompact.

-0.5em

3pt

1.0 -0.5em

Perf. Metric
AUC (\(\uparrow\)) 0.517 0.809 0.811 0.850 0.786 0.837 0.820 0.837 0.681 0.871
MRR (\(\uparrow\)) 0.018 0.110 0.101 0.125 0.237 0.116 0.109 0.109 0.047 0.148
NDCG@1 (\(\uparrow\)) 0.820 0.956 0.954 0.951 0.935 0.953 0.953 0.952 0.918 0.951

1.0 -0.5em

Table 7: Out-Of-Domain testbed results for link prediction (top) and node classification (bottom) tasks. Higher (\(\uparrow\)) scores are better. The best result is in bold, and the second best result is underlined.
Perf. Metric
AUC (\(\uparrow\)) 0.495 0.726 0.727 0.701 0.684 0.750 0.668 0.741 0.571 0.705
MRR (\(\uparrow\)) 0.019 0.074 0.086 0.046 0.060 0.089 0.056 0.066 0.044 0.082
NDCG@1 (\(\uparrow\)) 0.722 0.828 0.836 0.848 0.828 0.901 0.796 0.842 0.810 0.848

-0.5em

3pt

1.0 -0.5em

Perf. Metric
AUC (\(\uparrow\)) 0.522 0.798 0.797 0.842 0.827 0.767 0.812 0.796 0.667 0.870
MRR (\(\uparrow\)) 0.031 0.072 0.061 0.132 0.368 0.074 0.209 0.047 0.075 0.260
NDCG@1 (\(\uparrow\)) 0.841 0.958 0.960 0.957 0.951 0.953 0.947 0.956 0.921 0.964

1.0 -0.5em

Table 8: Small-To-Large testbed results for link prediction (top) and node classification (bottom) tasks. Higher (\(\uparrow\)) scores are better. The best result is in bold, and the second best result is underlined.
Perf. Metric
AUC (\(\uparrow\)) 0.508 0.724 0.726 0.711 0.761 0.664 0.701 0.697 0.467 0.736
MRR (\(\uparrow\)) 0.011 0.058 0.082 0.109 0.095 0.036 0.042 0.034 0.016 0.071
NDCG@1 (\(\uparrow\)) 0.795 0.861 0.896 0.902 0.883 0.855 0.862 0.864 0.830 0.857

6.2 Discussion on Limitations and Future Directions↩︎

Limitations. In principle, using GLEMOS to select a GL model to employ for a new graph is based on the assumption that similar graph datasets exist in the benchmark. Therefore, it may not be very effective if the new graph is significantly different from all graphs in the benchmark (e.g., the new graph is from a completely new domain). However, as the benchmark data continue to grow over time, such cases will be increasingly less likely, while model selection performances will likely improve with the addition of more data. Furthermore, while GLEMOS currently supports two fundamental GL tasks, namely, node classification and link prediction, it can be further extended with additional tasks (e.g., graph classification). Incorporating them into GLEMOS is one of the future plans.

Future Directions. Below we list promising research directions to further improve the algorithms as well as the benchmark for instantaneous GL model selection.

\(\mathbin{\vcenter{\rule{.5ex}{.5ex}}}~\)Direction 1: enabling model selection methods to use additional graph data. While existing methods utilize model performances and graph structural information captured by meta-features, they currently do not take other available graph data into account, such as node and edge features, timestamps in the case of dynamic graphs, and node and edge types (e.g., knowledge graphs). These data can be useful for modeling graph similarities, and the benchmark can further be enriched with such additional data.

\(\mathbin{\vcenter{\rule{.5ex}{.5ex}}}~\)Direction 2: developing data augmentation techniques. Adding new performance records to the benchmark can improve the effectiveness of model selection methods. However, it is often computationally expensive to train and evaluate GL models on non-trivial graphs. Data augmentation techniques for GL model performances can be helpful in this data sparse regime, especially for optimizable methods that require a lot of data to learn effectively.

\(\mathbin{\vcenter{\rule{.5ex}{.5ex}}}~\)Direction 3: handling out-of-distribution settings. Existing model selection methods are mainly designed for an in-distribution setup, as they assume that there exist observed graphs similar to a query graph. Thus their performance is suboptimal when a query graph comes from a new distribution. Investigating how to achieve generalization in such an out-of-distribution scenario would be beneficial.

\(\mathbin{\vcenter{\rule{.5ex}{.5ex}}}~\)Direction 4: effective performance collection. When we have a limited budget for performance measurements on new graphs, selecting which pairs of graphs and models to evaluate and include in the benchmark can greatly influence the learning of model selection methods. Thus the ability to find a small set of representative pairs can lead to a fast and effective performance collection. Challenges include how to make such selections from a heterogeneous model set with multiple GL methods.

7 Conclusion↩︎

The choice of a GL model has a significant impact on the performance of downstream tasks. Despite recent efforts to tackle this important problem, there exists no benchmark environment to evaluate the performance of GL model selection methods, and to support the development of new methods. In this work, we develop GLEMOS, the first benchmark environment for instantaneous GL model selection.

  • Extensive Benchmark Data. Among others, GLEMOS provides an extensive collection of model performances on fundamental GL tasks, i.e., link prediction and node classification, which is by far the largest and most comprehensive benchmark for Prob. 1 to the best of our knowledge.

  • Algorithms and Testbeds. GLEMOS provides representative algorithms for Prob. 1, as well as multiple testbeds to assess model selection performance in practical usage scenarios.

  • Extensible Open Source Environment. GLEMOS is designed to be easily extended with new GL models, new graphs, new performance records, and new GL tasks, while allowing reproducibility.

In the appendix, we provide additional experimental results (); details of model selection algorithms (), and meta-graph features (); experimental settings (); a discussion on the usage and extensibility of GLEMOS(); details of the data GLEMOS provides (); and hosting, licensing, and maintenance plan ().

8 Additional Results↩︎

8.1 Cross-Task Testbed Results↩︎

In , we report the cross-task testbed results in two transfer learning settings, i.e., (a) node classification to link prediction () and (b) link prediction to node classification (). Compared to other testbeds that operate on the performances measured for only one type of GL task, nearly all methods exhibit performance decline in this challenging setup, which indicates that GL models that are good for one type of task may not be as effective for another type of GL task. In contrast to other testbeds, sophisticated algorithms (e.g., MetaGL) tend to experience more performance decrease in this testbed than simple averaging methods (e.g., GB-Rank), which perform close to the best. By designing mechanisms that can model how performance characteristics on one task would translate to those on another, optimizable algorithms can be made much more effective in this setting.

-0.2em

3pt

1.0 -0.5em

Perf. Metric
AUC (\(\uparrow\))
(0.016)
(0.013)
(0.013)
(0.014)
(0.015)
(0.014)
(0.015)
(0.013)
(0.015)
(0.014)
MRR (\(\uparrow\))
(0.003)
(0.002)
(0.002)
(0.002)
(0.006)
(0.002)
(0.005)
(0.002)
(0.003)
(0.002)
NDCG@1 (\(\uparrow\))
(0.008)
(0.008)
(0.006)
(0.007)
(0.008)
(0.007)
(0.007)
(0.008)
(0.008)
(0.008)

1.0 -0.5em

Table 9: Cross-Task testbed results for node classification-to-link prediction (top) and link prediction-to-node classification (bottom) settings. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric
AUC (\(\uparrow\))
(0.025)
(0.025)
(0.025)
(0.026)
(0.025)
(0.024)
(0.025)
(0.025)
(0.024)
(0.026)
MRR (\(\uparrow\))
(0.008)
(0.016)
(0.011)
(0.012)
(0.009)
(0.010)
(0.006)
(0.016)
(0.001)
(0.012)
NDCG@1 (\(\uparrow\))
(0.022)
(0.018)
(0.019)
(0.021)
(0.020)
(0.021)
(0.022)
(0.017)
(0.021)
(0.021)

8.2 Results With Different Sets of Meta-Graph Features↩︎

We present three sets of meta-graph features in the main text, i.e., Mregular, Mgraphlets, and Mcompact, which consist of different types of graph structural features. Here we provide these different sets of meta-graph features to model selection algorithms, and evaluate their performance using each set. In addition to the above three sets, we also use the concatenation of Mregular and Mgraphlets, denoted Mreg+graphlets, which augments the regular structural features with graphlet-based features, forming the largest set with 1074 meta-graph features. In , we report the performance of model selection algorithms in terms of their ROC AUC scores for the five testbeds. Since the Random Selection and Global Best (GB) algorithms are independent of meta-features, their performances are the same across different features. From the results below, we make the following observations.

Using additional meta-graph features can improve model selection results. For instance, in Sparse testbed (), the best performing method, MetaGL, achieves the highest AUC by using an augmented feature set Mreg+graphlets. As distinct graph features may capture different aspects of graph structural properties, they can provide further information to find a better model.

More features do not always lead to a better performance. For example, in , we see mixed results with optimizable methods (e.g., NCF). In some cases, they experience some performance improvements, while in others, their performance declines as they use more features. The capability to adaptively utilize meta-features for the given context could further improve their performances.

The impact of different meta-graph features is more pronounced in the more challenging transfer learning settings, i.e., Out-Of-Domain (), Small-To-Large (), and Cross-Task () testbeds. These testbeds present additional challenges for model selection methods to achieve an effective generalization (e.g., large differences exist in graph data distributions or graph sizes between training and testing phases). As existing methods do not take such challenges into account, they are prone to overfitting and thus may not generalize well in the testing phase. For example, the performance of MetaGL in the Cross-Task testbed () is the lowest when using the largest feature set Mreg+graphlets. A stronger and more robust transfer capability would be needed to enable a better use of additional meta-graph features in such cases. On the other hand, we also observe that using more meta-features leads to a significant performance improvement for relatively simple methods, e.g., ISAC in Small-To-Large testbed (), which shows the promises and potential of meta-graph features to handle these challenging transfer learning settings.

-0.5em

3pt

1.0 -0.5em

Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.013)
(0.011)
(0.010)
(0.011)
(0.010)
(0.009)
(0.009)
(0.011)
(0.015)
(0.009)
2-12 Mregular
(0.013)
(0.011)
(0.010)
(0.011)
(0.010)
(0.011)
(0.010)
(0.011)
(0.014)
(0.009)
2-12 Mgraphlets
(0.013)
(0.011)
(0.010)
(0.011)
(0.011)
(0.011)
(0.010)
(0.011)
(0.014)
(0.009)
2-12 Mreg+graphlets
(0.013)
(0.011)
(0.010)
(0.011)
(0.010)
(0.010)
(0.010)
(0.010)
(0.014)
(0.009)

1.0 -0.5em

Table 10: Fully-Observed testbed results, obtained with different meta-graph features, for link prediction (top) and node classification (bottom) tasks. Mreg+graphlets denotes a concatenation of Mregular and Mgraphlets meta-graph features. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.026)
(0.024)
(0.024)
(0.024)
(0.023)
(0.022)
(0.023)
(0.024)
(0.029)
(0.023)
2-12 Mregular
(0.026)
(0.024)
(0.024)
(0.023)
(0.023)
(0.022)
(0.023)
(0.025)
(0.028)
(0.024)
2-12 Mgraphlets
(0.026)
(0.024)
(0.024)
(0.024)
(0.026)
(0.023)
(0.025)
(0.024)
(0.029)
(0.023)
2-12 Mreg+graphlets
(0.026)
(0.024)
(0.024)
(0.023)
(0.026)
(0.025)
(0.023)
(0.025)
(0.029)
(0.025)

-0.2em

3pt

1.0 -0.5em

Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.013)
(0.011)
(0.010)
(0.011)
(0.012)
(0.009)
(0.010)
(0.011)
(0.014)
(0.010)
2-12 Mregular
(0.013)
(0.011)
(0.010)
(0.011)
(0.012)
(0.010)
(0.010)
(0.011)
(0.015)
(0.010)
2-12 Mgraphlets
(0.013)
(0.011)
(0.010)
(0.011)
(0.013)
(0.010)
(0.010)
(0.011)
(0.015)
(0.009)
2-12 Mreg+graphlets
(0.013)
(0.011)
(0.010)
(0.011)
(0.012)
(0.010)
(0.010)
(0.010)
(0.014)
(0.010)

1.0 -0.5em

Table 11: Sparse testbed results, obtained with different meta-graph features, and performance matrices with a sparsity of \(50\%\), for link prediction (top) and node classification (bottom) tasks. Mreg+graphlets denotes a concatenation of Mregular and Mgraphlets meta-graph features. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.026)
(0.024)
(0.024)
(0.025)
(0.023)
(0.023)
(0.023)
(0.024)
(0.030)
(0.024)
2-12 Mregular
(0.026)
(0.024)
(0.024)
(0.022)
(0.021)
(0.024)
(0.024)
(0.025)
(0.031)
(0.023)
2-12 Mgraphlets
(0.026)
(0.024)
(0.024)
(0.024)
(0.024)
(0.025)
(0.023)
(0.023)
(0.030)
(0.025)
2-12 Mreg+graphlets
(0.026)
(0.024)
(0.024)
(0.024)
(0.025)
(0.024)
(0.022)
(0.024)
(0.029)
(0.023)

-0.2em

3pt

1.0 -0.5em

Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.013)
(0.010)
(0.010)
(0.012)
(0.012)
(0.010)
(0.009)
(0.010)
(0.015)
(0.009)
2-12 Mregular
(0.013)
(0.010)
(0.010)
(0.009)
(0.012)
(0.010)
(0.009)
(0.009)
(0.015)
(0.009)
2-12 Mgraphlets
(0.013)
(0.010)
(0.010)
(0.011)
(0.011)
(0.012)
(0.010)
(0.010)
(0.015)
(0.009)
2-12 Mreg+graphlets
(0.013)
(0.010)
(0.010)
(0.011)
(0.011)
(0.009)
(0.009)
(0.010)
(0.015)
(0.009)

1.0 -0.5em

Table 12: Out-Of-Domain testbed results, obtained with different meta-graph features, for link prediction (top) and node classification (bottom) tasks. Mreg+graphlets denotes a concatenation of Mregular and Mgraphlets meta-graph features. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.025)
(0.025)
(0.025)
(0.025)
(0.025)
(0.022)
(0.023)
(0.023)
(0.027)
(0.024)
2-12 Mregular
(0.025)
(0.025)
(0.025)
(0.025)
(0.023)
(0.023)
(0.026)
(0.024)
(0.027)
(0.023)
2-12 Mgraphlets
(0.025)
(0.025)
(0.025)
(0.025)
(0.026)
(0.022)
(0.024)
(0.024)
(0.030)
(0.024)
2-12 Mreg+graphlets
(0.025)
(0.025)
(0.025)
(0.024)
(0.025)
(0.023)
(0.025)
(0.026)
(0.029)
(0.025)

-0.2em

3pt

1.0 -0.5em

Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.027)
(0.017)
(0.017)
(0.022)
(0.020)
(0.021)
(0.018)
(0.017)
(0.029)
(0.018)
2-12 Mregular
(0.027)
(0.017)
(0.017)
(0.018)
(0.022)
(0.020)
(0.024)
(0.017)
(0.029)
(0.018)
2-12 Mgraphlets
(0.027)
(0.017)
(0.017)
(0.016)
(0.019)
(0.021)
(0.020)
(0.018)
(0.028)
(0.016)
2-12 Mreg+graphlets
(0.027)
(0.017)
(0.017)
(0.016)
(0.020)
(0.018)
(0.021)
(0.018)
(0.027)
(0.018)

1.0 -0.5em

Table 13: Small-To-Large testbed results, obtained with different meta-graph features, for link prediction (top) and node classification (bottom) tasks. Mreg+graphlets denotes a concatenation of Mregular and Mgraphlets meta-graph features. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.039)
(0.042)
(0.042)
(0.043)
(0.038)
(0.044)
(0.041)
(0.039)
(0.039)
(0.041)
2-12 Mregular
(0.039)
(0.042)
(0.042)
(0.044)
(0.037)
(0.041)
(0.035)
(0.042)
(0.043)
(0.037)
2-12 Mgraphlets
(0.039)
(0.042)
(0.042)
(0.039)
(0.037)
(0.039)
(0.042)
(0.039)
(0.044)
(0.040)
2-12 Mreg+graphlets
(0.039)
(0.042)
(0.042)
(0.039)
(0.044)
(0.042)
(0.038)
(0.041)
(0.043)
(0.038)

-0.2em

3pt

1.0 -0.5em

Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.016)
(0.013)
(0.013)
(0.013)
(0.016)
(0.014)
(0.014)
(0.013)
(0.016)
(0.014)
2-12 Mregular
(0.016)
(0.013)
(0.013)
(0.014)
(0.015)
(0.014)
(0.015)
(0.013)
(0.015)
(0.014)
2-12 Mgraphlets
(0.016)
(0.013)
(0.013)
(0.014)
(0.016)
(0.014)
(0.015)
(0.014)
(0.015)
(0.014)
2-12 Mreg+graphlets
(0.016)
(0.013)
(0.013)
(0.015)
(0.015)
(0.014)
(0.015)
(0.013)
(0.015)
(0.015)

1.0 -0.5em

Table 14: Cross-Task testbed results, obtained with different meta-graph features, for node classification-to-link prediction (top) and link prediction-to-node classification (bottom) settings.Mreg+graphlets denotes a concatenation of Mregular and Mgraphlets meta-graph features. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric Meta-Feature
AUC (\(\uparrow\)) Mcompact
(0.025)
(0.025)
(0.025)
(0.027)
(0.027)
(0.025)
(0.025)
(0.025)
(0.024)
(0.024)
2-12 Mregular
(0.025)
(0.025)
(0.025)
(0.026)
(0.025)
(0.024)
(0.025)
(0.025)
(0.024)
(0.026)
2-12 Mgraphlets
(0.025)
(0.025)
(0.025)
(0.029)
(0.026)
(0.025)
(0.024)
(0.025)
(0.025)
(0.025)
2-12 Mreg+graphlets
(0.025)
(0.025)
(0.025)
(0.028)
(0.026)
(0.025)
(0.025)
(0.026)
(0.026)
(0.027)

8.3 Results on Time Cost↩︎

Figure 4: Time taken for selecting the best model (in seconds), which includes the time for generating meta-graph features for the new graph, and the time for model selection algorithms to infer the best model for the given graph based on them.

Model Selection Runtime. In , we report the distribution of runtime (in seconds) for model selection, measured over the graphs in the benchmark. The runtime includes the time for generating meta-graph features for the new graph, and the time for a model selection method to infer the best model based on them. The median runtime (shown by the red line) is less than one second, and for a majority of the graphs, it takes at most five to six seconds. Note that the distributions are mostly the same for different methods, as the time for different model selection algorithms to infer the best model is very short (close to zero second), and similar to each other. We measured the time for meta feature generation via sequential processing for simplicity, while these features can be processed in parallel as they are independent of each other.

Training Runtime. reports the time (in seconds) taken for training model selection algorithms until convergence on the Fully-Observed testbed for the link prediction task. Note that Global Best algorithms and AS are excluded as they do not require model training. ISAC takes the least amount of time as it only performs clustering without model parameter updates during the training phase. Algorithms that rely on neural networks require more training time. Yet a majority of them can still be trained quickly, in just a few seconds to a few minutes. MetaOD takes the largest amount of time with its current optimization framework.

0.5em

Table 15: Time (in seconds) taken for training model selection algorithms on the Fully-Observed testbed for the link prediction task. For each algorithm, we show the training runtime averaged over the splits in the testbed, and the standard deviation in the parentheses.
(0.0279)
(0.6473)
(0.0897)
(2.3227)
(89.6785)
(50.9201)

8.4 Testbed Results With Standard Error↩︎

In the main text, we provide the average model selection performances in four testbeds, i.e., Fully-Observed, Sparse, Out-Of-Domain, and Small-To-Large testbeds, but their standard errors could not be shown due to space constraint. In this subsection, we present the standard error along with the average performance in those four testbeds (). Please refer to the main text for the discussion of the results of these testbeds. Cross-Task testbed results are discussed in

-0.2em

3pt

1.0 -0.5em

Perf. Metric
AUC (\(\uparrow\))
(0.013)
(0.011)
(0.010)
(0.011)
(0.010)
(0.011)
(0.010)
(0.011)
(0.014)
(0.009)
MRR (\(\uparrow\))
(0.001)
(0.010)
(0.007)
(0.011)
(0.019)
(0.015)
(0.015)
(0.008)
(0.008)
(0.017)
NDCG@1 (\(\uparrow\))
(0.006)
(0.003)
(0.004)
(0.004)
(0.004)
(0.004)
(0.003)
(0.003)
(0.003)
(0.003)

1.0 -0.5em

Table 16: Fully-Observed testbed results for link prediction (top) and node classification (bottom) tasks. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric
AUC (\(\uparrow\))
(0.026)
(0.024)
(0.024)
(0.023)
(0.023)
(0.022)
(0.023)
(0.025)
(0.028)
(0.024)
MRR (\(\uparrow\))
(0.008)
(0.018)
(0.022)
(0.021)
(0.029)
(0.019)
(0.019)
(0.023)
(0.010)
(0.023)
NDCG@1 (\(\uparrow\))
(0.021)
(0.015)
(0.015)
(0.013)
(0.013)
(0.010)
(0.014)
(0.013)
(0.017)
(0.019)

-0.3em

3pt

1.0 -0.5em

Perf. Metric Sparsity
AUC (\(\uparrow\)) 10%
(0.013)
(0.011)
(0.010)
(0.011)
(0.011)
(0.011)
(0.010)
(0.011)
(0.014)
(0.010)
2-12 30%
(0.013)
(0.011)
(0.010)
(0.011)
(0.012)
(0.011)
(0.010)
(0.011)
(0.014)
(0.010)
2-12 50%
(0.013)
(0.011)
(0.010)
(0.011)
(0.012)
(0.010)
(0.010)
(0.011)
(0.015)
(0.010)
2-12 70%
(0.013)
(0.010)
(0.010)
(0.011)
(0.011)
(0.011)
(0.010)
(0.010)
(0.015)
(0.010)
2-12 90%
(0.013)
(0.011)
(0.010)
(0.012)
(0.007)
(0.012)
(0.012)
(0.012)
(0.015)
(0.011)

1.0 -0.5em

Table 17: Sparse testbed results for link prediction (top) and node classification (bottom) tasks. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric Sparsity
AUC (\(\uparrow\)) 10%
(0.026)
(0.024)
(0.024)
(0.023)
(0.023)
(0.023)
(0.026)
(0.024)
(0.029)
(0.021)
2-12 30%
(0.026)
(0.023)
(0.024)
(0.023)
(0.023)
(0.022)
(0.024)
(0.024)
(0.031)
(0.023)
2-12 50%
(0.026)
(0.024)
(0.024)
(0.022)
(0.021)
(0.024)
(0.024)
(0.025)
(0.031)
(0.023)
2-12 70%
(0.026)
(0.024)
(0.023)
(0.023)
(0.018)
(0.023)
(0.024)
(0.023)
(0.029)
(0.026)
2-12 90%
(0.026)
(0.026)
(0.025)
(0.026)
(0.012)
(0.027)
(0.026)
(0.026)
(0.030)
(0.024)

-0.5em

3pt

1.0 -0.5em

Perf. Metric
AUC (\(\uparrow\))
(0.013)
(0.010)
(0.010)
(0.009)
(0.012)
(0.010)
(0.009)
(0.009)
(0.015)
(0.009)
MRR (\(\uparrow\))
(0.002)
(0.010)
(0.008)
(0.010)
(0.017)
(0.010)
(0.010)
(0.010)
(0.005)
(0.011)
NDCG@1 (\(\uparrow\))
(0.007)
(0.003)
(0.003)
(0.003)
(0.004)
(0.003)
(0.003)
(0.003)
(0.004)
(0.004)

1.0 -0.5em

Table 18: Out-Of-Domain testbed results for link prediction (top) and node classification (bottom) tasks. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric
AUC (\(\uparrow\))
(0.025)
(0.025)
(0.025)
(0.025)
(0.023)
(0.023)
(0.026)
(0.024)
(0.027)
(0.023)
MRR (\(\uparrow\))
(0.005)
(0.015)
(0.019)
(0.009)
(0.016)
(0.017)
(0.012)
(0.011)
(0.012)
(0.017)
NDCG@1 (\(\uparrow\))
(0.023)
(0.016)
(0.015)
(0.013)
(0.017)
(0.011)
(0.019)
(0.015)
(0.019)
(0.016)

-0.5em

3pt

1.0 -0.5em

Perf. Metric
AUC (\(\uparrow\))
(0.027)
(0.017)
(0.017)
(0.018)
(0.022)
(0.020)
(0.024)
(0.017)
(0.029)
(0.018)
MRR (\(\uparrow\))
(0.009)
(0.011)
(0.011)
(0.019)
(0.036)
(0.016)
(0.029)
(0.005)
(0.012)
(0.031)
NDCG@1 (\(\uparrow\))
(0.011)
(0.004)
(0.004)
(0.005)
(0.008)
(0.005)
(0.007)
(0.004)
(0.008)
(0.006)

1.0 -0.5em

Table 19: Small-To-Large testbed results for link prediction (top) and node classification (bottom) tasks. Higher (\(\uparrow\)) scores are better. The numbers in the parentheses denote one standard error. The best result is in bold, and the second best result is underlined.
Perf. Metric
AUC (\(\uparrow\))
(0.039)
(0.042)
(0.042)
(0.044)
(0.037)
(0.041)
(0.035)
(0.042)
(0.043)
(0.037)
MRR (\(\uparrow\))
(0.002)
(0.021)
(0.023)
(0.025)
(0.029)
(0.009)
(0.013)
(0.008)
(0.007)
(0.024)
NDCG@1 (\(\uparrow\))
(0.032)
(0.028)
(0.025)
(0.024)
(0.023)
(0.024)
(0.028)
(0.029)
(0.031)
(0.027)

9 Model Selection Algorithms↩︎

Here we provide details of the model selection algorithms included in GLEMOS.

Random Selection assigns random scores to the models, and thus chooses the best model purely randomly without considering prior model performances and meta-graph features.

Global Best (GB)-AvgPerf [34] computes the average performance of each model over all observed graphs, and selects the one with the largest average performance. Thus this algorithm is independent of meta-graph features.

Global Best (GB)-AvgRank [13] computes the rank (in percentile) of all models for each graph, where a higher performance is assigned a larger rank percentile, and selects the model that has the largest average rank percentile over observed graphs. Thus this algorithm is also independent of meta-graph features.

ISAC [32] clusters observed graphs into \(k\) groups in the space of meta-graph features, and when given a new graph, finds the cluster nearest to the given graph and selects the model that obtained the highest average performance over the observed graphs in that cluster. \(k\) is set to 5 in our experiments.

ARGOSMART(AS) [31] finds the observed graph, which is the most similar to the test graph in terms of meta-graph features, and selects the model that had the best performance on that graph.

Supervised Surrogates () [33] optimizes a surrogate model (i.e., a regressor) to transform meta-graph features into model performances. We use a two-layer feedforward neural network with ReLU nonlinearity as the surrogate model.

ALORS [52] learns latent factors of the observed graphs and models by performing low-rank nonnegative matrix factorization on the performance matrix, and then optimizes a non-linear regressor to map meta-graph features into latent graph factors. Given a new graph, it predicts the new graph’s latent factor, and estimates the model performances on the new graph to be the dot product between the estimated graph factor and the learned model factor.

NCF [53] adapts ALORS by replacing the dot product operation used in ALORS with a more flexible neural network model, which predicts model performances by jointly employing the linearity of matrix factorization and nonlinearity of deep neural networks.

MetaOD [34] improves ALORS for the model selection task, e.g., by employing a rank-based meta-learning objective instead of the usual reconstruction objective. As the first method for unsupervised outlier model selection, it also presents specialized meta-features to capture the outlying characteristics of non-graph data.

MetaGL [13] extends MetaOD for GL model selection by designing meta-graph features to capture the structural characteristics of a graph, employing the top-1 probability as a meta-training objective, and modeling the relations between graphs and models in the form of a heterogeneous graph and learning latent factors by applying graph neural networks on it.

10 Meta-Graph Features↩︎

10.1 Time Complexity Analysis↩︎

Let \(G=(V, E)\) be a graph, where \(V\) and \(E\) denote the set of nodes and edges in graph \(G\), respectively. Overall, the time complexity of generating meta-graph features for graph \(G\) is \[\begin{align} \mathcal{O}(k|E|\Delta) \end{align}\] where \(k\) is the number of feature extractors in the extractor set \(|\Psi|\), and \(\Delta\) denotes a small constant.

Each feature extractor in \(\Psi\) extracts a specific graph structural feature, such as network motifs and PageRank scores, as well as other graph statistics such as the density of a graph. We estimate the frequency of all network motifs with \(\{2,3,4\}\)-nodes, which can be done in \(\mathcal{O}(|E|\Delta)\) time, where \(\Delta\) is a small constant representing the maximum sampled degree, which can be set by the user. For further details, please refer to [54]. Other structural features, such as PageRank, take \(\mathcal{O}(|E|)\) time, and graph statistics, such as the number of nodes and edges and the graph density, can be obtained even more efficiently. Assuming \(k\) such feature extractors, overall it takes \(\mathcal{O}(k|E|\Delta)\) time.

Note that GLEMOS aims to provide a representative and diverse set of graph features, which have been proven effective in previous studies. At the same time, our framework is flexible, and can support any set of meta-graph features. Thus, depending on the task, more computationally expensive, yet informative new features might be additionally used, or the set of features may be limited to only those that can be efficiently computed in time strictly linear in the number of edges, i.e., \(\mathcal{O}(|E|)\), in which case the \(\Delta\) term would be dropped, and we have \(\mathcal{O}(k|E|)\). Further, note that feature extractors are independent of each other, and can be computed in parallel.

10.2 Statistical Functions↩︎

lists the statistical functions \(\Sigma\), which derives a set of meta-graph features that summarize the statistical distribution of graph invariants, such as the node degree distribution, or \(k\)-core numbers. Such vectors have different sizes for different graphs as their size is determined by the number of nodes or edges of the graph. The statistical functions \(\Sigma\) in transforms those vectors of varying length into fixed-size meta-feature vectors.

Table 20: Statistical functions \(\Sigma\) to derive a set of meta-graph features from a graph invariant, e.g., degree distribution or \(k\)-core numbers.\(\boldsymbol{\mathrm{x}}\) denotes a vector of arbitrary graph invariants for some graph \(G_i=(V_i,E_i)\), such as node degree vector, and PageRank vector (i.e., PageRank scores of each node in \(G_i\)). \(\pi(\boldsymbol{\mathrm{x}})\) denotes the sorted vector of \(\boldsymbol{\mathrm{x}}\).
Function Equation Rationale Variants Representation (\(\boldsymbol{\mathrm{x}}\), ...) Variable Type (Q,C,T)
Min, Max \(\min(\boldsymbol{\mathrm{x}})\), \(\max(\boldsymbol{\mathrm{x}})\) \(-\)
Median \(\mathsf{med}(\boldsymbol{\mathrm{x}})\) Variable normality
Geometric Mean \(\left|\boldsymbol{\mathrm{x}}\right|^{-1} \prod_i x_i\) Variable normality
Harmonic Mean \(\left|\boldsymbol{\mathrm{x}}\right| / \sum_i \frac{1}{x_i}\) Variable normality
Mean, Stdev, Variance \(\mu_{\boldsymbol{\mathrm{x}}}\), \(\sigma_{\boldsymbol{\mathrm{x}}}\), \(\sigma^2_{\boldsymbol{\mathrm{x}}}\) Variable normality
Skewness \(\frac{\mathbb{E}(\boldsymbol{\mathrm{x}}- \mu_{\boldsymbol{\mathrm{x}}})^3}{\sigma^3_{\boldsymbol{\mathrm{x}}}}\) Variable normality
Pearson Kurtosis \(\textrm{Kurt}[\boldsymbol{\mathrm{x}}]=\frac{\mathbb{E}(\boldsymbol{\mathrm{x}}- \mu_{\boldsymbol{\mathrm{x}}})^4}{\sigma^4_{\boldsymbol{\mathrm{x}}}}\) (biased/unbiased) Variable normality Fisher/Pearson, and bias/unbiased.
Fisher Kurtosis \(\textrm{Kurt}[\boldsymbol{\mathrm{x}}]-3.0\) (biased/unbiased) Variable normality Fisher/Pearson, and bias/unbiased.
Quartile Dispersion Coeff. \(\frac{Q_3-Q_1}{Q_3+Q_1}\) Dispersion
Median Absolute Deviation \(\mathsf{med}(\left|\boldsymbol{\mathrm{x}}- \mathsf{med}(\boldsymbol{\mathrm{x}})\right|)\) Dispersion
Avg. Absolute Deviation \(\frac{1}{\left|\boldsymbol{\mathrm{x}}\right|} \boldsymbol{\mathrm{e}}^T\!\left|\boldsymbol{\mathrm{x}}- \mu_{\boldsymbol{\mathrm{x}}}\right|\) Dispersion
Coeff. of Variation \(\frac{\sigma_{\boldsymbol{\mathrm{x}}}}{\mu_{\boldsymbol{\mathrm{x}}}}\) Dispersion
Efficiency Ratio \(\frac{\sigma^2_{\boldsymbol{\mathrm{x}}}}{\mu^2_{\boldsymbol{\mathrm{x}}}}\) Dispersion
Variance-to-Mean Ratio \(\frac{\sigma^2_{\boldsymbol{\mathrm{x}}}}{\mu_{\boldsymbol{\mathrm{x}}}}\) Dispersion
Signal-to-Noise Ratio (SNR) \(\frac{\mu^2_{\boldsymbol{\mathrm{x}}}}{\sigma^2_{\boldsymbol{\mathrm{x}}}}\) Noisiness of data
Entropy \(H(\boldsymbol{\mathrm{x}}) = -\sum_{i} \; x_i \log x_i\) Variable Informativeness
Norm. Entropy \({H(\boldsymbol{\mathrm{x}})} / {\log_2 \left|\boldsymbol{\mathrm{x}}\right|}\) Variable Informativeness
Gini Coefficient \(\frac{\sum_{i=1}^{\left|\boldsymbol{\mathrm{x}}\right|} (2i-\left|\boldsymbol{\mathrm{x}}\right|-1) \pi(\boldsymbol{\mathrm{x}})_i}{n \sum_{i=1}^{\left|\boldsymbol{\mathrm{x}}\right|} \pi(\boldsymbol{\mathrm{x}})_i}\) Variable Informativeness
\(Q_1\), \(Q_3\) median of the \(\left|\boldsymbol{\mathrm{x}}\right|/2\) smallest (largest) values \(-\)
IQR \(Q_3 - Q_1\) \(-\)
Outlier LB \(\alpha \in \{1.5,3\}\) \(Q_1-\alpha IQR\) Data noisiness
Outlier UB \(\alpha \in \{1.5,3\}\) \(Q_3+\alpha IQR\) Data noisiness
\(~~~~~~(\beta,\gamma) \in \{(1,0),(0,1),(1,1)\}\) \(\beta \cdot \sum_{i} \mathbb{I}(x_i \!<\! Q_1\!-\!\alpha IQR) + \gamma \cdot \sum_{i} \mathbb{I}(x_i \!>\! Q_3+\alpha IQR)\) Data noisiness
\(~~~~~~(\beta,\gamma) \in \{(1,0),(0,1),(1,1)\}\) \((\beta \cdot \sum_{i} \mathbb{I}(x_i \!<\! Q_1\!-\!\alpha IQR) + \gamma \cdot \sum_{i} \mathbb{I}(x_i \!>\! Q_3+\alpha IQR)) / \left|\boldsymbol{\mathrm{x}}\right|\) Data noisiness
\(~~~~~~(\beta,\gamma) \in \{(1,0),(0,1),(1,1)\}\) \(\beta \cdot \sum_{i} \mathbb{I}(x_i \!<\! \mu_{\boldsymbol{\mathrm{x}}} - \alpha \sigma_{\boldsymbol{\mathrm{x}}}) + \gamma \cdot \sum_{i} \mathbb{I}(x_i \!>\! \mu_{\boldsymbol{\mathrm{x}}} + \alpha \sigma_{\boldsymbol{\mathrm{x}}})\) Data noisiness \(o/\left|\boldsymbol{\mathrm{x}}\right|\), lb, ub, total
\(~~~~~~(\beta,\gamma) \in \{(1,0),(0,1),(1,1)\}\) \((\beta \cdot \sum_{i} \mathbb{I}(x_i \!<\! \mu_{\boldsymbol{\mathrm{x}}} - \alpha \sigma_{\boldsymbol{\mathrm{x}}}) + \gamma \cdot \sum_{i} \mathbb{I}(x_i \!>\! \mu_{\boldsymbol{\mathrm{x}}} + \alpha \sigma_{\boldsymbol{\mathrm{x}}})) / \left|\boldsymbol{\mathrm{x}}\right|\) Data noisiness \(o/\left|\boldsymbol{\mathrm{x}}\right|\), lb, ub, total
Mode modal (most common) value in \(\boldsymbol{\mathrm{x}}\)
Mode Count count for the modal value in \(\boldsymbol{\mathrm{x}}\)
Mode Frac. mode count of \(\boldsymbol{\mathrm{x}}\) / \(\left|\boldsymbol{\mathrm{x}}\right|\)

11 Experimental Settings and Details↩︎

Hardware. Experiments were performed on a Linux server on AWS, running Ubuntu 20.04.5 LTS with Intel Xeon Platinum 8275CL CPUs @ 3.00GHz, 1.1TB RAM, and NVIDIA A100 SXM4 GPUs with 40GB memory.

Performance Evaluation. To evaluate the performance of optimizable GL methods, such as GCN [35] and GraphSAGE [45], we trained these methods for up to 300 epochs, using Adam optimizer with a learning rate of 0.001, and applying a validation-based early stopping with a patience of 30 epochs. A few graphs have multiple labels for each node. For those graphs, we found that using a larger learning rate and patience leads to a better performance. So for the multi-label node classification datasets, we used a learning rate of 0.01 and a patience of 60 epochs. As an early stopping criterion, we used ROC AUC for link prediction, and used accuracy for node classification (or weighted average precision for multi-label node classification). Not all graphs come with input node features. For those graphs without input node features, we used randomly initialized embeddings of size 32 as input node embeddings, and let those embeddings optimized during model training.

Meta-Graph Features. We present three sets of meta-graph features in the main text, i.e., Mregular, Mgraphlets, and Mcompact. The experimental results reported in the main text were obtained using Mregular. We also report results obtained with three different sets of meta-graph features, i.e., Mgraphlets, Mcompact, and Mreg+graphlets, in .

Graph Learning (GL) Methods. In the evaluation using the proposed testbeds, model selection algorithms aim to predict the best model from the set of differently configured GL models, that is, GCN [35], GraphSAGE [45], GAT [36], GIN [46], EGC [47], SGC [37], ChebNet [48], PNA [49], DGI [39], spectral embedding [40], GraRep [50], node2vec [38], label propagation [41], Jaccard’s Coeff. [43], Resource Alloc. [51], Adamic/Adar [43], and SEAL [42]. Their hyperparameter settings are provided in Table 2 in the main text. We implemented spectral embedding [40] and GraRep [50], using NumPy2 and SciPy3. We used GRAPE [55] for the implementation of node2vec [38]. We used NetworkX [56] for the implementation of classical link prediction methods, i.e., Jaccard’s Coeff. [43], Resource Alloc. [51], and Adamic/Adar [43]. We used PyTorch Geometric (PyG) [57] for the implementation of other GL methods, e.g., GCN [35] and GraphSAGE [45]. Spectral embedding and GraRep use SciPy’s functionalities to find eigenvalue/eigenvectors and perform singular value decomposition, respectively; for these methods, we used the default parameter values specified by SciPy, except for the parameters that we vary in creating the model set \(\mathbcal{M}\). Similarly, for methods supported by PyG, we used their default parameter settings in the corresponding package, while varying a few important parameters to create the model set \(\mathbcal{M}\).

Model Selection Algorithms. In experiments, we evaluate ten model selection algorithms, that is, Random Selection, Global Best-AvgPerf [34], Global Best-AvgRank [13], ISAC [32], ARGOSMART(AS) [31], Supervised Surrogates () [33], ALORS [52], NCF [53], MetaOD [34], and MetaGL [13]. For MetaGL [13] and MetaOD [34], we used the authors’ implementation. We adapted MetaOD’s implementation so that it can work with a sparse performance matrix. We implemented other model selection algorithms included in GLEMOS in python using open source libraries such as NumPy and DGL4. Global Best methods perform a global averaging of the performance matrix. Given sparse performance matrices, they average over observed entries alone and ignore missing entries. ISAC [32] applies \(k\)-means algorithm to meta-graph features to cluster observed graphs into \(5\) groups. AS [31] uses cosine similarity scoring to find the \(1\)-NN observed graph.  [33] uses Adam optimizer to train an MLP regressor with two hidden layers, which is optimized to transform meta-graph features into model performances. ALORS [52] learns latent embeddings by using nonnegative matrix factorization, and uses an MLP regressor with two hidden layers to transform meta-graph features into latent graph factors. NCF [53] produces latent graph and model embedding by using an MLP regressor with two hidden layers, which is optimized via Adam optimizer with a learning rate of 0.01 and a weight decay of 0.0001. MetaOD [34] uses the default parameter settings given by the original implementation, e.g., a random forest regressor with 100 estimators and a max depth of 10. MetaGL [13] uses heterogeneous graph transformer (HGT) [58] as a graph encoder, which contains 2 layers and 4 attention heads per layer. In its G-M network, nodes are connected to their top-30 most similar nodes. The MetaGL model is optimized with Adam optimizer with a learning rate of 0.00075 and a weight decay of 0.0001. For optimizable algorithms discussed above, which involve learning low-dimensional embeddings, we consistently set the embedding size to 32.

12 Usage and Extensibility↩︎

We describe how GLEMOS can be used and extended with graphs, models, and performance records.

12.1 Graphs↩︎

To support graph data from multiple data sources, graphs in GLEMOS are represented by the class, which is a wrapper class that holds the graph data from different sources, and provides auxiliary methods serving as a common interface to different types of graph data.

The set of graphs in GLEMOS is represented by the class, which provides the functionality to load graphs for a certain graph learning task like node classification, as well as graphs from specific data sources, such as the Network Repository (NetRepo) [59] and PyTorch Geometric (PyG) [57], as shown below.

from graphs.graphset import GraphSet
graph_set = GraphSet(['netrepo', 'pyg'])
node_classification_graphs = graph_set.node_classification_graphs()

Adding New Graphs From Existing Sources. We currently provide code to incorporate graphs from the two popular graph repositories, i.e., NetRepo and PyG. Adding new graphs from these repositories can be done easily by instantiating a object for the new graph, and adding it to the list returned by the functions (e.g,. )

Adding New Graphs From New Sources (e.g., a new graph repository or your own graph data) can be done by (1) adding a script for the new data source in the package, which will parse the raw graph data, and construct a object for the new graph, and (2) registering the new data source in the class.

12.2 Models↩︎

The set of models included in GLEMOS is defined by the package. Each graph learning method and its hyperparameter settings to be searched over are defined by a separate class that inherits from the class. For example, the following code defines the GAT model set.

class GATModelSettings(ModelSettings):
    def __init__(self):
        super().__init__()
        self.variable_hyperparams = ModelSettings.alpha_ordered_dict({
            'hidden_channels': [16, 64],
            'num_layers': [1, 2, 3],
            'dropout': [0.0, 0.5],
            'heads': [1, 4],
            'concat': [True, False],
        })
    
    @classmethod
    def load_model(cls, in_channels, out_channels, **params):
        return GAT(in_channels=in_channels, out_channels=out_channels, **params)

In the above code snippet, a GAT model instance, instantiated with particular hyperparameter settings, is obtained via the method.

Adding New Hyperparameter Settings to Existing Models can be done by simply adding additional hyperparameter settings to the corresponding class.

Adding New Models can be done by (1) creating a new class for the new graph learning model, (2) specifying the set of hyperparameter settings to be searched over, and (3) completing the method, such that new instantiated model object is to be returned, as in the above code.

12.3 Performance Records↩︎

Data Splits. For consistent comparisons among different models, GLEMOS provides the data splits used in the evaluation. The class provides functionalities to generate and load data splits (e.g., node splits for node classification, and edge splits for link prediction), and the generated data splits are included in GLEMOS. Then these previously generated data splits are used when evaluating new GL models on the existing graphs in the benchmark.

Model Training and Evaluation. Given the instantiated GL model and the graph data, model training and evaluation is taken care of by the class. The same code can be used to train and evaluate new GL models on either existing or new graphs,

Parallel Processing. To perform the training and evaluation of multiple <model, graph> pairs, the aforementioned class is repeatedly executed by the class. The class is designed to support parallel processing, such that multiple processes pick up an unevaluated <model, graph> pair, and perform model training and evaluation in parallel.

13 Additional Details of the Dataset↩︎

13.1 Data Overview↩︎

GLEMOS provides the following sets of data.

  1. Performance records: performance of graph learning (GL) models on various graphs, measured in multiple metrics for the following GL tasks.

    • Node classification performances

    • Link prediction performances

  2. Graph data splits used for evaluating GL models.

    • node splits for node classification

    • edge splits for link prediction

  3. Testbed data splits (i.e., splitting over the performance matrix and the meta-graph features) to evaluate model selection algorithms.

  4. Meta-graph features: we provide the following sets of meta-graph features.

    • Mregular

    • Mgraphlets

    • Mcompact

    • Mreg+graphlets

Performance records [app:data:overview:perf], splitting of graph data [app:data:overview:graphsplits] and testbed data [app:data:overview:testbedsplits], and meta-graph features [app:data:overview:metafeats] were generated by GLEMOS by processing the graph data (). For details of these four types of data, please refer to the main text. We give further description of the graph data used in the benchmark in the next subsection.

No Personal or Offensive Contents. Note that GLEMOS does not include personal data (e.g., personally identifiable information) or offensive contents in all the data described above ([app:data:overview:perf] to [app:data:overview:metafeats]).

Figure 5: Distribution of the graphs in terms of the number of nodes (x-axis) and edges (y-axis). Each dot corresponds to a graph, and is colored depending on whether the nodes in the graph have class labels or not.

0.5em

Table 21: Distribution of the graphs in GLEMOS per graph domain. In total, GLEMOS currently covers 457 graphs drawn from 37 domains.
Social Networks 53 Synthetic-ER 15 Synthetic-SBM 6
Chemical 40 Temporal Reachability 14 Road 5
Protein 26 Synthetic-RandPart 12 Recommendation 5
Retweet 26 Interaction 10 Computer Vision 4
Biological Networks 24 Proximity 10 Flight 3
Web Graphs 22 Brain Networks 9 Misc 3
Economic Networks 17 Power Networks 8 Co-Purchase 3
Synthetic-BA 17 Citation 8 Scientific Computing 3
Friendship 16 Email 7 Infrastructure 2
Synthetic-CL 16 Technology 7 Coauthor 2
Collaboration 15 Ecology 6 Phone Call Networks 1
Synthetic-KPGM 15 Wikipedia 6
Synthetic-Others 15 Knowledgebase 6
Total Number of Graphs 457

13.2 Graph Data↩︎

Graph Data Sources. We collect graph data from two widely used public graph repositories, i.e., Network Repository (NetRepo) [59] and PyTorch Geometric (PyG) [57]. In , we provide the list of graphs in GLEMOS, and specify the data source for each graph.

Graph Data Overview. shows the distribution of the graph in GLEMOS in terms of the number of nodes (x-axis) and the number of edge (y-axis), where each dot corresponds to a graph, and is colored depending on whether nodes are labeled or not. shows the distribution of the graphs per graph domain. In total, 457 graphs drawn from 37 domains are currently used for model performance evaluation.

List of Graphs. lists the graphs in GLEMOS, and provides information on the graph size, the number of node classes, data source, and graph domain.

r c r r r c c


& & & & & &
Table– Continued from the previous page
& & & & & &

& PLC-60-30-L2 & 117,572 & 7,045,181 & 2 & NetRepo & Synthetic-Others
2 & soc-Flickr-ASU & 80,513 & 5,899,882 & 195 & NetRepo & Social Networks
3 & sc-shipsec5 & 179,104 & 4,400,152 & n/a & NetRepo & Scientific Computing
4 & web-Stanford & 281,903 & 3,985,272 & n/a & NetRepo & Web Graphs
5 & soc-youtube & 495,957 & 3,873,496 & n/a & NetRepo & Social Networks
6 & web-arabic-2005 & 163,598 & 3,494,538 & n/a & NetRepo & Web Graphs
7 & sc-shipsec1 & 140,385 & 3,415,518 & n/a & NetRepo & Scientific Computing
8 & LINKX-penn94 & 41,554 & 2,724,458 & 2 & PyG & Social Networks
9 & socfb-Penn94 & 41,536 & 2,724,440 & n/a & NetRepo & Social Networks
10 & sc-nasasrb & 54,870 & 2,622,454 & n/a & NetRepo & Scientific Computing
11 & socfb-Michigan23 & 30,147 & 2,353,032 & n/a & NetRepo & Social Networks
12 & socfb-UGA50 & 24,389 & 2,348,114 & n/a & NetRepo & Social Networks
13 & web-NotreDame & 325,729 & 2,207,671 & n/a & NetRepo & Web Graphs
14 & ca-dblp-2012 & 317,080 & 2,099,732 & n/a & NetRepo & Collaboration
15 & Entities-BGS & 333,845 & 1,832,398 & 2 & PyG & Knowledgebase
16 & KPGM-log16-16-trial1 & 46,872 & 1,818,168 & n/a & NetRepo & Synthetic-KPGM
17 & socfb-Oklahoma97 & 17,425 & 1,785,056 & n/a & NetRepo & Social Networks
18 & soc-twitter-follows-mun & 465,017 & 1,667,082 & n/a & NetRepo & Social Networks
19 & ca-MathSciNet & 332,689 & 1,641,288 & n/a & NetRepo & Collaboration
20 & AttributedGraph-PPI & 56,944 & 1,612,348 & 121 & PyG & Protein
21 & socfb-Cornell5 & 18,660 & 1,581,554 & n/a & NetRepo & Social Networks
22 & LINKX-cornell5 & 18,660 & 1,581,554 & 2 & PyG & Friendship
23 & ia-dbpedia-team-bi & 365,492 & 1,560,266 & n/a & NetRepo & Interaction
24 & rt-higgs & 425,008 & 1,465,617 & n/a & NetRepo & Retweet
25 & ca-dblp-2010 & 226,413 & 1,432,920 & n/a & NetRepo & Collaboration
26 & ia-wiki-trust-dir & 138,592 & 1,432,057 & n/a & NetRepo & Interaction
27 & soc-twitter-follows & 404,719 & 1,426,638 & n/a & NetRepo & Social Networks
28 & socfb-Virginia63 & 21,325 & 1,396,356 & n/a & NetRepo & Social Networks
29 & KPGM-log16-12-trial1 & 44,241 & 1,395,310 & n/a & NetRepo & Synthetic-KPGM
30 & BA-2_3_60 & 10,708 & 1,281,300 & n/a & NetRepo & Synthetic-BA
31 & tech-RL-caida & 190,914 & 1,215,220 & n/a & NetRepo & Technology
32 & BA-2_21_60 & 9,993 & 1,195,500 & n/a & NetRepo & Synthetic-BA
33 & KPGM-log16-10-trial1 & 42,551 & 1,177,546 & n/a & NetRepo & Synthetic-KPGM
34 & BA-2_20_60 & 9,691 & 1,159,260 & n/a & NetRepo & Synthetic-BA
35 & BA-2_15_60 & 9,340 & 1,117,140 & n/a & NetRepo & Synthetic-BA
36 & socfb-Syracuse56 & 13,653 & 1,087,964 & n/a & NetRepo & Social Networks
37 & socfb-NotreDame57 & 12,155 & 1,082,678 & n/a & NetRepo & Social Networks
38 & scc_fb-messages & 1,303 & 1,063,786 & n/a & NetRepo & Temporal Reachability
39 & socfb-UC33 & 16,808 & 1,044,294 & n/a & NetRepo & Social Networks
40 & CL-100000-1d7-trial3 & 92,967 & 1,043,480 & n/a & NetRepo & Synthetic-CL
41 & BA-2_9_60 & 8,717 & 1,042,380 & n/a & NetRepo & Synthetic-BA
42 & socfb-Duke14 & 9,885 & 1,012,874 & n/a & NetRepo & Social Networks
43 & GemsecDeezer-HR & 54,573 & 996,404 & 84 & PyG & Friendship
44 & LINKX-genius & 421,961 & 984,979 & 2 & PyG & Social Networks
45 & socfb-JMU79 & 14,070 & 971,128 & n/a & NetRepo & Social Networks
46 & scc_twitter-copen & 2,623 & 947,228 & n/a & NetRepo & Temporal Reachability
47 & BA-2_23_40 & 11,770 & 939,960 & n/a & NetRepo & Synthetic-BA
48 & soc-slashdot-zoo & 79,120 & 935,738 & n/a & NetRepo & Social Networks
49 & BA-2_11_40 & 11,337 & 905,320 & n/a & NetRepo & Synthetic-BA
50 & Flickr & 89,250 & 899,756 & 7 & PyG & Computer Vision
51 & socfb-UCSD34 & 14,948 & 886,442 & n/a & NetRepo & Social Networks
52 & rec-github & 121,709 & 879,770 & n/a & NetRepo & Recommendation
53 & CL-100000-1d8-trial3 & 92,402 & 871,580 & n/a & NetRepo & Synthetic-CL
54 & econ-psmigr3 & 3,140 & 824,702 & n/a & NetRepo & Economic Networks
55 & econ-psmigr1 & 3,140 & 824,702 & n/a & NetRepo & Economic Networks
56 & econ-psmigr2 & 3,140 & 821,562 & n/a & NetRepo & Economic Networks
57 & socfb-Yale4 & 8,578 & 810,900 & n/a & NetRepo & Social Networks
58 & CL-100000-1d9-trial3 & 91,889 & 742,469 & n/a & NetRepo & Synthetic-CL
59 & ia-wiki-Talk & 92,117 & 721,534 & n/a & NetRepo & Interaction
60 & soc-slashdot & 70,068 & 717,294 & n/a & NetRepo & Social Networks
61 & socfb-Cal65 & 11,247 & 702,716 & n/a & NetRepo & Social Networks
62 & web-sk-2005 & 121,422 & 668,838 & n/a & NetRepo & Web Graphs
63 & soc-douban & 154,908 & 654,324 & n/a & NetRepo & Social Networks
64 & ER-2_2_50 & 11,429 & 651,760 & n/a & NetRepo & Synthetic-ER
65 & BA-2_24_60-L2 & 10,693 & 639,750 & 2 & NetRepo & Synthetic-BA
66 & CL-100000-2d0-trial1 & 91,471 & 631,153 & n/a & NetRepo & Synthetic-CL
67 & ER-3_19_50 & 108,999 & 600,554 & n/a & NetRepo & Synthetic-ER
68 & ER-2_21_50 & 10,804 & 584,286 & n/a & NetRepo & Synthetic-ER
69 & GitHub & 37,700 & 578,006 & 2 & PyG & Friendship
70 & socfb-Tulane29 & 7,752 & 567,836 & n/a & NetRepo & Social Networks
71 & socfb-Wake73 & 5,372 & 558,382 & n/a & NetRepo & Social Networks
72 & CL-100000-2d1-trial2 & 90,880 & 543,035 & n/a & NetRepo & Synthetic-CL
73 & HeterophilousGraph-Tolokers & 11,758 & 519,000 & 2 & PyG & Collaboration
74 & web-spam-detection & 9,072 & 514,700 & 3 & NetRepo & Web Graphs
75 & ER-2_8_50 & 10,070 & 506,096 & n/a & NetRepo & Synthetic-ER
76 & Coauthor-Physics & 34,493 & 495,924 & 5 & PyG & Coauthor
77 & Amazon-Computers & 13,752 & 491,722 & 10 & PyG & Co-Purchase
78 & AttributedGraph-Flickr & 7,575 & 479,476 & 9 & PyG & Social Networks
79 & ia-wikiquote-user-edits & 93,445 & 476,865 & n/a & NetRepo & Interaction
80 & rec-yelp-user-business & 50,395 & 459,208 & n/a & NetRepo & Recommendation
81 & GemsecDeezer-HU & 47,538 & 445,774 & 84 & PyG & Friendship
82 & PLC-40-30-L5 & 11,025 & 437,979 & 5 & NetRepo & Synthetic-Others
83 & WikiCS & 11,701 & 431,726 & 10 & PyG & Wikipedia
84 & soc-brightkite & 56,739 & 425,890 & n/a & NetRepo & Social Networks
85 & KPGM-log14-16-trial3 & 12,545 & 425,872 & n/a & NetRepo & Synthetic-KPGM
86 & socfb-UChicago30 & 6,591 & 416,206 & n/a & NetRepo & Social Networks
87 & web-wiki-squirrel & 5,201 & 396,846 & n/a & NetRepo & Web Graphs
88 & ca-AstroPh & 17,903 & 393,944 & n/a & NetRepo & Collaboration
89 & ER-2_14_50 & 8,851 & 390,774 & n/a & NetRepo & Synthetic-ER
90 & ER-3_18_50 & 86,337 & 381,446 & n/a & NetRepo & Synthetic-ER
91 & LINKX-johnshopkins55 & 5,180 & 373,172 & 2 & PyG & Social Networks
92 & socfb-Rice & 4,087 & 369,656 & n/a & NetRepo & Social Networks
93 & scc_infect-dublin & 10,972 & 351,146 & n/a & NetRepo & Temporal Reachability
94 & AttributedGraph-BlogCatalog & 5,196 & 343,486 & 6 & PyG & Social Networks
95 & FacebookPagePage & 22,470 & 342,004 & 4 & PyG & Web Graphs
96 & web-wiki-crocodile & 11,631 & 341,691 & n/a & NetRepo & Web Graphs
97 & soc-BlogCatalog-ASU & 10,312 & 333,983 & 39 & NetRepo & Social Networks
98 & KPGM-log14-12-trial1 & 11,893 & 330,072 & n/a & NetRepo & Synthetic-KPGM
99 & road-usroads-48 & 126,146 & 323,900 & n/a & NetRepo & Road
100 & Twitch-DE & 9,498 & 315,774 & 2 & PyG & Friendship
101 & Tox21-p53 & 153,563 & 314,046 & 47 & NetRepo & Chemical
102 & socfb-UC64 & 6,833 & 310,664 & n/a & NetRepo & Social Networks
103 & Tox21-AHR & 147,772 & 302,188 & 49 & NetRepo & Chemical
104 & tech-p2p-gnutella & 62,561 & 295,756 & n/a & NetRepo & Technology
105 & Tox21-HSE & 136,239 & 277,682 & 47 & NetRepo & Chemical
106 & socfb-Wesleyan43 & 3,593 & 276,070 & n/a & NetRepo & Social Networks
107 & Mutagenicity & 131,488 & 266,894 & 14 & NetRepo & Chemical
108 & Tox21-MMP & 127,998 & 260,962 & 47 & NetRepo & Chemical
109 & Tox21-aromatase & 126,483 & 257,092 & 46 & NetRepo & Chemical
110 & GemsecDeezer-RO & 41,773 & 251,652 & 84 & PyG & Friendship
111 & NELL & 65,755 & 251,550 & 186 & PyG & Knowledgebase
112 & rec-amazon & 91,813 & 251,408 & n/a & NetRepo & Recommendation
113 & fb-CMU-Carnegie49 & 6,637 & 249,967 & 3 & NetRepo & Social Networks
114 & socfb-Middlebury45 & 3,075 & 249,220 & n/a & NetRepo & Social Networks
115 & DBLP & 26,128 & 239,566 & 4 & PyG & Knowledgebase
116 & road-luxembourg-osm & 114,599 & 239,332 & n/a & NetRepo & Road
117 & Amazon-Photo & 7,650 & 238,162 & 8 & PyG & Co-Purchase
118 & ca-HepPh & 11,204 & 235,238 & n/a & NetRepo & Collaboration
119 & EllipticBitcoin & 203,769 & 234,355 & 3 & PyG & Economic Networks
120 & Twitch-FR & 6,551 & 231,883 & 2 & PyG & Friendship
121 & KPGM-log14-8-trial1 & 10,978 & 227,990 & n/a & NetRepo & Synthetic-KPGM
122 & socfb-Trinity100 & 2,613 & 223,992 & n/a & NetRepo & Social Networks
123 & MSRC-21 & 43,644 & 223,312 & 22 & NetRepo & Computer Vision
124 & DHFR-MD & 9,380 & 222,452 & 7 & NetRepo & Chemical
125 & ER-1_5_20 & 1,050 & 220,058 & n/a & NetRepo & Synthetic-ER
126 & bio-HS-CX & 4,413 & 217,636 & n/a & NetRepo & Biological Networks
127 & WikipediaNetwork-Squirrel & 5,201 & 217,073 & 5 & PyG & Wikipedia
128 & ER-MD & 9,512 & 209,482 & 10 & NetRepo & Chemical
129 & ER-1_3_20 & 1,017 & 206,432 & n/a & NetRepo & Synthetic-ER
130 & soc-wiki-elec & 7,118 & 201,564 & n/a & NetRepo & Social Networks
131 & soc-epinions & 26,588 & 200,240 & n/a & NetRepo & Social Networks
132 & DeezerEurope & 28,281 & 185,504 & 2 & PyG & Friendship
133 & ca-CondMat & 21,363 & 182,572 & n/a & NetRepo & Collaboration
134 & LINKX-amherst41 & 2,235 & 181,908 & 2 & PyG & Friendship
135 & socfb-Oberlin44 & 2,920 & 179,824 & n/a & NetRepo & Social Networks
136 & econ-orani678 & 2,529 & 173,747 & n/a & NetRepo & Economic Networks
137 & tech-internet-as & 40,164 & 170,246 & n/a & NetRepo & Technology
138 & bio-DR-CX & 3,289 & 169,880 & n/a & NetRepo & Biological Networks
139 & Coauthor-CS & 18,333 & 163,788 & 15 & PyG & Coauthor
140 & ER-1_25_20 & 886 & 157,146 & n/a & NetRepo & Synthetic-ER
141 & HeterophilousGraph-Questions & 48,921 & 153,540 & 2 & PyG & Interaction
142 & bio-DM-CX & 4,040 & 153,434 & n/a & NetRepo & Biological Networks
143 & Entities-MUTAG & 23,644 & 148,454 & 2 & PyG & Knowledgebase
144 & copresence-SFHH & 403 & 147,114 & n/a & NetRepo & Proximity
145 & rec-movielens-tag-movies-10m & 16,528 & 142,148 & n/a & NetRepo & Recommendation
146 & scc_fb-forum & 488 & 142,022 & n/a & NetRepo & Temporal Reachability
147 & BZR-MD & 6,519 & 137,734 & 8 & NetRepo & Chemical
148 & ia-frwikinews-user-edits & 25,042 & 137,354 & n/a & NetRepo & Interaction
149 & scc_retweet & 1,206 & 131,980 & n/a & NetRepo & Temporal Reachability
150 & ER-1_6_20 & 803 & 128,654 & n/a & NetRepo & Synthetic-ER
151 & ER-1_16_10 & 1,126 & 127,004 & n/a & NetRepo & Synthetic-ER
152 & CitationFull-Cora & 19,793 & 126,842 & 70 & PyG & Citation
153 & bio-SC-HT & 2,084 & 126,054 & n/a & NetRepo & Biological Networks
154 & StochasticBlockModel-3.0 & 1,000 & 123,752 & 4 & PyG & Synthetic-SBM
155 & Twitch-ES & 4,648 & 123,412 & 2 & PyG & Friendship
156 & BA-1_9_60 & 1,056 & 123,060 & n/a & NetRepo & Synthetic-BA
157 & socfb-Swarthmore42 & 1,659 & 122,100 & n/a & NetRepo & Social Networks
158 & tech-WHOIS & 7,476 & 113,886 & n/a & NetRepo & Technology
159 & rec-movielens-user-movies-10m & 7,601 & 110,779 & n/a & NetRepo & Recommendation
160 & email-EU & 32,430 & 108,794 & n/a & NetRepo & Email
161 & StochasticBlockModel-2.5 & 1,000 & 107,416 & 4 & PyG & Synthetic-SBM
162 & bio-CE-GN & 2,220 & 107,366 & n/a & NetRepo & Biological Networks
163 & tech-as-caida2007 & 26,475 & 106,762 & n/a & NetRepo & Technology
164 & CitationFull-DBLP & 17,716 & 105,734 & 4 & PyG & Citation
165 & CL-10000-1d7-trial3 & 9,267 & 105,485 & n/a & NetRepo & Synthetic-CL
166 & cit-DBLP & 12,591 & 99,255 & n/a & NetRepo & Citation
167 & soc-anybeat & 12,645 & 98,264 & n/a & NetRepo & Social Networks
168 & KPGM-log12-16-trial3 & 3,324 & 97,110 & n/a & NetRepo & Synthetic-KPGM
169 & bio-CE-PG & 1,871 & 95,508 & n/a & NetRepo & Biological Networks
170 & web-indochina-2004 & 11,358 & 95,212 & n/a & NetRepo & Web Graphs
171 & HeterophilousGraph-Amazon-ratings & 24,492 & 93,050 & 5 & PyG & Co-Purchase
172 & BA-1_6_60 & 803 & 92,700 & n/a & NetRepo & Synthetic-BA
173 & econ-beaflw & 502 & 90,202 & n/a & NetRepo & Economic Networks
174 & StochasticBlockModel-2.0 & 1,000 & 89,892 & 4 & PyG & Synthetic-SBM
175 & BA-1_18_40 & 1,141 & 89,640 & n/a & NetRepo & Synthetic-BA
176 & CitationFull-PubMed & 19,717 & 88,648 & 3 & PyG & Citation
177 & AttributedGraph-Facebook & 4,039 & 88,234 & 193 & PyG & Social Networks
178 & CL-10000-1d8-trial3 & 9,251 & 87,601 & n/a & NetRepo & Synthetic-CL
179 & econ-beacxc & 492 & 84,754 & n/a & NetRepo & Economic Networks
180 & econ-mbeacxc & 487 & 83,776 & n/a & NetRepo & Economic Networks
181 & econ-mbeaflw & 487 & 83,776 & n/a & NetRepo & Economic Networks
182 & soc-advogato & 6,551 & 82,859 & n/a & NetRepo & Social Networks
183 & BA-1_3_40 & 1,017 & 79,720 & n/a & NetRepo & Synthetic-BA
184 & econ-beause & 507 & 79,254 & n/a & NetRepo & Economic Networks
185 & Twitch-RU & 4,385 & 78,993 & 2 & PyG & Friendship
186 & bio-HS-LC & 4,227 & 78,968 & n/a & NetRepo & Biological Networks
187 & soc-gplus & 23,628 & 78,388 & n/a & NetRepo & Social Networks
188 & ia-escorts-dynamic & 10,106 & 78,040 & n/a & NetRepo & Interaction
189 & Twitch-EN & 7,126 & 77,774 & 2 & PyG & Friendship
190 & RandomPartitionGraph-hr0.5-ad15 & 5,000 & 75,702 & 10 & PyG & Synthetic-RandPart
191 & KPGM-log12-12-trial2 & 3,214 & 75,682 & n/a & NetRepo & Synthetic-KPGM
192 & RandomPartitionGraph-hr0.7-ad15 & 5,000 & 75,042 & 10 & PyG & Synthetic-RandPart
193 & RandomPartitionGraph-hr0.1-ad15 & 5,000 & 75,026 & 10 & PyG & Synthetic-RandPart
194 & RandomPartitionGraph-hr0.3-ad15 & 5,000 & 74,978 & 10 & PyG & Synthetic-RandPart
195 & web-spam & 4,767 & 74,750 & n/a & NetRepo & Web Graphs
196 & StochasticBlockModel-1.5 & 1,000 & 74,218 & 4 & PyG & Synthetic-SBM
197 & CL-10000-1d9-trial1 & 9,177 & 73,296 & n/a & NetRepo & Synthetic-CL
198 & econ-mbeause & 492 & 72,818 & n/a & NetRepo & Economic Networks
199 & bio-SC-CC & 2,223 & 69,758 & n/a & NetRepo & Biological Networks
200 & ER-3_25_5 & 52,336 & 69,246 & n/a & NetRepo & Synthetic-ER
201 & bio-SC-GT & 1,716 & 67,974 & n/a & NetRepo & Biological Networks
202 & DHFR & 32,075 & 67,352 & 9 & NetRepo & Chemical
203 & AIDS & 31,385 & 64,780 & 38 & NetRepo & Biological Networks
204 & Twitch-PT & 1,912 & 64,510 & 2 & PyG & Friendship
205 & web-wiki-chameleon & 2,277 & 62,792 & n/a & NetRepo & Web Graphs
206 & CL-10000-2d0-trial1 & 9,130 & 62,615 & n/a & NetRepo & Synthetic-CL
207 & soc-political-retweet & 18,470 & 61,157 & 2 & NetRepo & Retweet
208 & MixHopSynthetic-Homophily-0.7 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
209 & MixHopSynthetic-Homophily-0.3 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
210 & MixHopSynthetic-Homophily-0.6 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
211 & MixHopSynthetic-Homophily-0.8 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
212 & MixHopSynthetic-Homophily-0.9 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
213 & MixHopSynthetic-Homophily-0.1 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
214 & MixHopSynthetic-Homophily-0.0 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
215 & MixHopSynthetic-Homophily-0.4 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
216 & MixHopSynthetic-Homophily-0.5 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
217 & MixHopSynthetic-Homophily-0.2 & 5,000 & 59,596 & 10 & PyG & Synthetic-Others
218 & CL-10000-2d1-trial2 & 9,078 & 59,026 & n/a & NetRepo & Synthetic-CL
219 & Entities-AIFB & 8,285 & 58,086 & 4 & PyG & Knowledgebase
220 & LastFMAsia & 7,624 & 55,612 & 18 & PyG & Friendship
221 & KPGM-log12-8-trial3 & 2,968 & 53,510 & n/a & NetRepo & Synthetic-KPGM
222 & reality-call & 27,045 & 52,050 & 2 & NetRepo & Phone Call Networks
223 & web-webbase-2001 & 16,062 & 51,186 & n/a & NetRepo & Web Graphs
224 & econ-poli-large & 15,575 & 50,511 & n/a & NetRepo & Economic Networks
225 & RandomPartitionGraph-hr0.7-ad10 & 5,000 & 49,974 & 10 & PyG & Synthetic-RandPart
226 & RandomPartitionGraph-hr0.5-ad10 & 5,000 & 49,688 & 10 & PyG & Synthetic-RandPart
227 & RandomPartitionGraph-hr0.3-ad10 & 5,000 & 49,548 & 10 & PyG & Synthetic-RandPart
228 & RandomPartitionGraph-hr0.1-ad10 & 5,000 & 49,434 & 10 & PyG & Synthetic-RandPart
229 & tech-pgp & 10,680 & 48,632 & n/a & NetRepo & Technology
230 & scc_retweet-crawl & 17,151 & 48,030 & n/a & NetRepo & Temporal Reachability
231 & ER-2_7_5 & 9,583 & 46,642 & n/a & NetRepo & Synthetic-ER
232 & BA-1_10_60-L5 & 804 & 46,410 & 5 & NetRepo & Synthetic-BA
233 & CL-10K-1d8-L5 & 10,000 & 44,896 & 5 & NetRepo & Synthetic-CL
234 & MSRC-9 & 8,968 & 43,288 & 10 & NetRepo & Computer Vision
235 & bio-SC-LC & 2,004 & 40,904 & n/a & NetRepo & Biological Networks
236 & MSRC-21C & 8,418 & 40,380 & 21 & NetRepo & Computer Vision
237 & ER-3_16_5 & 32,358 & 40,346 & n/a & NetRepo & Synthetic-ER
238 & StochasticBlockModel-0.5 & 1,000 & 40,020 & 4 & PyG & Synthetic-SBM
239 & StochasticBlockModel-1.0 & 1,000 & 40,020 & 4 & PyG & Synthetic-SBM
240 & HeterophilousGraph-Minesweeper & 10,000 & 39,402 & 2 & PyG & Synthetic-Others
241 & web-BerkStan & 12,305 & 39,000 & n/a & NetRepo & Web Graphs
242 & LINKX-reed98 & 962 & 37,624 & 2 & PyG & Friendship
243 & WikipediaNetwork-Chameleon & 2,277 & 36,101 & 5 & PyG & Wikipedia
244 & IMDB & 11,616 & 34,212 & 3 & PyG & Knowledgebase
245 & ER-1_14_5 & 829 & 34,184 & n/a & NetRepo & Synthetic-ER
246 & copresence-InVS15 & 219 & 33,450 & n/a & NetRepo & Proximity
247 & socfb-Caltech & 769 & 33,312 & n/a & NetRepo & Social Networks
248 & soc-hamsterster & 2,426 & 33,260 & n/a & NetRepo & Social Networks
249 & HeterophilousGraph-Roman-empire & 22,662 & 32,927 & 18 & PyG & Wikipedia
250 & bn-mouse-brain1 & 213 & 32,331 & n/a & NetRepo & Brain Networks
251 & inf-openflights & 2,939 & 31,354 & n/a & NetRepo & Infrastructure
252 & BZR & 14,479 & 31,070 & 10 & NetRepo & Chemical
253 & Actor & 7,600 & 30,019 & 5 & PyG & Wikipedia
254 & SW-10000-6-0d3-L5 & 10,000 & 30,000 & 5 & NetRepo & Synthetic-Others
255 & SW-10000-6-0d3-L2 & 10,000 & 30,000 & 2 & NetRepo & Synthetic-Others
256 & soc-sign-bitcoinalpha & 3,783 & 28,248 & n/a & NetRepo & Social Networks
257 & bio-HS-HT & 2,570 & 27,382 & n/a & NetRepo & Biological Networks
258 & ca-GrQc & 4,158 & 26,844 & n/a & NetRepo & Collaboration
259 & EmailEUCore & 1,005 & 25,571 & 42 & PyG & Email
260 & bio-grid-fission-yeast & 2,026 & 25,274 & n/a & NetRepo & Biological Networks
261 & RandomPartitionGraph-hr0.5-ad5 & 5,000 & 25,176 & 10 & PyG & Synthetic-RandPart
262 & RandomPartitionGraph-hr0.7-ad5 & 5,000 & 25,056 & 10 & PyG & Synthetic-RandPart
263 & RandomPartitionGraph-hr0.1-ad5 & 5,000 & 24,754 & 10 & PyG & Synthetic-RandPart
264 & RandomPartitionGraph-hr0.3-ad5 & 5,000 & 24,704 & 10 & PyG & Synthetic-RandPart
265 & ca-DBLP-kang & 2,879 & 22,652 & n/a & NetRepo & Collaboration
266 & power-bcspwr10 & 5,300 & 21,842 & n/a & NetRepo & Power Networks
267 & KPGM-log10-16-trial2 & 883 & 21,148 & n/a & NetRepo & Synthetic-KPGM
268 & email-dnc-corecipient & 906 & 20,858 & n/a & NetRepo & Email
269 & BA-1_8_10 & 1,040 & 20,690 & n/a & NetRepo & Synthetic-BA
270 & rt_lolgop & 9,765 & 20,150 & n/a & NetRepo & Retweet
271 & scc_enron-only & 146 & 19,656 & n/a & NetRepo & Temporal Reachability
272 & rt_barackobama & 9,631 & 19,547 & n/a & NetRepo & Retweet
273 & rt_justinbieber & 9,405 & 19,167 & n/a & NetRepo & Retweet
274 & SFHH-conf-sensor & 403 & 19,130 & n/a & NetRepo & Proximity
275 & PolBlogs & 1,490 & 19,025 & 2 & PyG & Web Graphs
276 & power-eris1176 & 1,176 & 18,552 & n/a & NetRepo & Power Networks
277 & AttributedGraph-Wiki & 2,405 & 17,981 & 17 & PyG & Wikipedia
278 & bn-fly-drosophila-medulla1 & 1,781 & 17,927 & n/a & NetRepo & Brain Networks
279 & web-EPA & 4,271 & 17,818 & n/a & NetRepo & Web Graphs
280 & BA-1_17_10 & 895 & 17,790 & n/a & NetRepo & Synthetic-BA
281 & rt_gmanews & 8,373 & 17,438 & n/a & NetRepo & Retweet
282 & BA-1_1_10 & 862 & 17,130 & n/a & NetRepo & Synthetic-BA
283 & rt_mittromney & 7,974 & 17,074 & n/a & NetRepo & Retweet
284 & KPGM-log10-12-trial1 & 845 & 16,934 & n/a & NetRepo & Synthetic-KPGM
285 & primary-school-proximity & 242 & 16,634 & n/a & NetRepo & Proximity
286 & BA-1_12_10 & 827 & 16,430 & n/a & NetRepo & Synthetic-BA
287 & CitationFull-CoraML & 2,995 & 16,316 & 7 & PyG & Citation
288 & rt_ksa & 7,302 & 16,216 & n/a & NetRepo & Retweet
289 & rt_onedirection & 7,987 & 16,203 & n/a & NetRepo & Retweet
290 & rt_saudi & 7,252 & 16,121 & n/a & NetRepo & Retweet
291 & ia-reality & 6,809 & 15,360 & n/a & NetRepo & Interaction
292 & econ-mahindas & 1,258 & 15,132 & n/a & NetRepo & Economic Networks
293 & ca-Erdos992 & 5,094 & 15,030 & n/a & NetRepo & Collaboration
294 & rt_dash & 6,288 & 14,870 & n/a & NetRepo & Retweet
295 & DD21 & 5,748 & 14,267 & 21 & NetRepo & Misc
296 & rt_alwefaq & 4,171 & 14,122 & n/a & NetRepo & Retweet
297 & Airports-USA & 1,190 & 13,599 & 4 & PyG & Flight
298 & tech-routers-rf & 2,113 & 13,264 & n/a & NetRepo & Technology
299 & power-US-Grid & 4,941 & 13,188 & n/a & NetRepo & Power Networks
300 & Peking-1 & 3,341 & 13,150 & 190 & NetRepo & Social Networks
301 & web-edu & 3,031 & 12,948 & n/a & NetRepo & Web Graphs
302 & rt_uae & 5,248 & 12,772 & n/a & NetRepo & Retweet
303 & rt_oman & 4,904 & 12,456 & n/a & NetRepo & Retweet
304 & scc_infect-hyper & 113 & 12,444 & n/a & NetRepo & Temporal Reachability
305 & econ-poli & 4,008 & 12,246 & n/a & NetRepo & Economic Networks
306 & KPGM-log10-8-trial2 & 796 & 12,080 & n/a & NetRepo & Synthetic-KPGM
307 & rt_p2 & 4,902 & 12,034 & n/a & NetRepo & Retweet
308 & contacts-prox-high-school-2013 & 327 & 11,636 & n/a & NetRepo & Proximity
309 & rt_gop & 4,687 & 11,058 & n/a & NetRepo & Retweet
310 & rt_tcot & 4,547 & 11,004 & n/a & NetRepo & Retweet
311 & email-univ & 1,133 & 10,902 & n/a & NetRepo & Email
312 & CitationFull-CiteSeer & 4,230 & 10,674 & 6 & PyG & Citation
313 & ca-cora & 2,708 & 10,556 & n/a & NetRepo & Collaboration
314 & PTC-FR & 5,110 & 10,532 & 19 & NetRepo & Chemical
315 & DD6 & 4,152 & 10,320 & 20 & NetRepo & Misc
316 & PTC-FM & 4,925 & 10,110 & 18 & NetRepo & Chemical
317 & PTC-MR & 4,915 & 10,108 & 18 & NetRepo & Chemical
318 & CL-1000-1d7-trial2 & 932 & 9,755 & n/a & NetRepo & Synthetic-CL
319 & PTC-MM & 4,695 & 9,624 & 20 & NetRepo & Chemical
320 & bio-DM-HT & 2,989 & 9,320 & n/a & NetRepo & Biological Networks
321 & CL-1000-1d7-trial1 & 928 & 9,279 & n/a & NetRepo & Synthetic-CL
322 & rt_islam & 4,497 & 9,232 & n/a & NetRepo & Retweet
323 & scc_rt_lolgop & 273 & 9,020 & n/a & NetRepo & Temporal Reachability
324 & rt_tlot & 3,665 & 8,949 & n/a & NetRepo & Retweet
325 & rt_lebanon & 3,961 & 8,871 & n/a & NetRepo & Retweet
326 & email-dnc-leak & 1,891 & 8,849 & n/a & NetRepo & Email
327 & TerroristRel & 881 & 8,592 & 2 & NetRepo & Collaboration
328 & bio-SC-TS & 636 & 7,918 & n/a & NetRepo & Biological Networks
329 & biogrid-human & 2,005 & 7,918 & n/a & NetRepo & Biological Networks
330 & rt_occupy & 3,225 & 7,883 & n/a & NetRepo & Retweet
331 & copresence-InVS13 & 95 & 7,830 & n/a & NetRepo & Proximity
332 & rt_damascus & 3,052 & 7,738 & n/a & NetRepo & Retweet
333 & rt_occupywallstnyc & 3,609 & 7,663 & n/a & NetRepo & Retweet
334 & biogrid-worm & 1,930 & 7,152 & n/a & NetRepo & Biological Networks
335 & road-minnesota & 2,642 & 6,606 & n/a & NetRepo & Road
336 & power-bcspwr09 & 1,723 & 6,511 & n/a & NetRepo & Power Networks
337 & email-radoslaw & 167 & 6,501 & n/a & NetRepo & Email
338 & bio-CE-GT & 924 & 6,478 & n/a & NetRepo & Biological Networks
339 & bn-macaque-rhesus-brain1 & 242 & 6,108 & n/a & NetRepo & Brain Networks
340 & CL-1000-2d0-trial3 & 916 & 6,004 & n/a & NetRepo & Synthetic-CL
341 & Airports-Europe & 399 & 5,995 & 4 & PyG & Flight
342 & bio-CE-HT & 2,617 & 5,970 & n/a & NetRepo & Biological Networks
343 & CL-1000-2d0-trial2 & 899 & 5,861 & n/a & NetRepo & Synthetic-CL
344 & soc-wiki-Vote & 889 & 5,828 & n/a & NetRepo & Social Networks
345 & rt_assad & 2,139 & 5,574 & n/a & NetRepo & Retweet
346 & web-google & 1,299 & 5,546 & n/a & NetRepo & Web Graphs
347 & infect-dublin & 410 & 5,530 & n/a & NetRepo & Proximity
348 & CL-1000-2d1-trial2 & 911 & 5,457 & n/a & NetRepo & Synthetic-CL
349 & AttributedGraph-Cora & 2,708 & 5,429 & 7 & PyG & Citation
350 & econ-wm1 & 260 & 4,943 & n/a & NetRepo & Economic Networks
351 & rt_voteonedirection & 2,280 & 4,928 & n/a & NetRepo & Retweet
352 & econ-wm3 & 259 & 4,918 & n/a & NetRepo & Economic Networks
353 & econ-wm2 & 259 & 4,908 & n/a & NetRepo & Economic Networks
354 & AttributedGraph-CiteSeer & 3,312 & 4,715 & 6 & PyG & Citation
355 & web-polblogs & 643 & 4,560 & n/a & NetRepo & Web Graphs
356 & bn-macaque-rhesus-interareal-cortical2 & 93 & 4,524 & n/a & NetRepo & Brain Networks
357 & infect-hyper & 113 & 4,392 & n/a & NetRepo & Proximity
358 & eco-foodweb-baydry & 128 & 4,212 & n/a & NetRepo & Ecology
359 & eco-foodweb-baywet & 128 & 4,150 & n/a & NetRepo & Ecology
360 & eco-florida & 128 & 4,150 & n/a & NetRepo & Ecology
361 & power-1138-bus & 1,138 & 4,054 & n/a & NetRepo & Power Networks
362 & KPGM-log8-12-trial3 & 230 & 3,522 & n/a & NetRepo & Synthetic-KPGM
363 & ca-CSphd & 1,882 & 3,480 & n/a & NetRepo & Collaboration
364 & DD242 & 1,284 & 3,303 & 20 & NetRepo & Misc
365 & bio-CE-LC & 1,387 & 3,296 & n/a & NetRepo & Biological Networks
366 & biogrid-mouse & 1,450 & 3,272 & n/a & NetRepo & Biological Networks
367 & power-685-bus & 685 & 3,249 & n/a & NetRepo & Power Networks
368 & bn-mouse-kasthuri-v4 & 1,029 & 3,118 & n/a & NetRepo & Brain Networks
369 & KPGM-log8-10-trial3 & 224 & 3,040 & n/a & NetRepo & Synthetic-KPGM
370 & ia-crime-moreno & 829 & 2,948 & n/a & NetRepo & Interaction
371 & eco-mangwet & 97 & 2,892 & n/a & NetRepo & Ecology
372 & inf-euroroad & 1,174 & 2,834 & n/a & NetRepo & Infrastructure
373 & road-euroroad & 1,174 & 2,834 & n/a & NetRepo & Road
374 & bn-macaque-rhesus-cerebral-cortex1 & 91 & 2,802 & n/a & NetRepo & Brain Networks
375 & copresence-LH10 & 73 & 2,762 & n/a & NetRepo & Proximity
376 & DD_g106 & 574 & 2,710 & n/a & NetRepo & Protein
377 & KPGM-log8-8-trial3 & 215 & 2,606 & n/a & NetRepo & Synthetic-KPGM
378 & road-ChicagoRegional & 1,467 & 2,596 & n/a & NetRepo & Road
379 & power-662-bus & 662 & 2,474 & n/a & NetRepo & Power Networks
380 & DD_g105 & 423 & 2,384 & n/a & NetRepo & Protein
381 & hospital-ward-proximity & 75 & 2,278 & n/a & NetRepo & Proximity
382 & DD_g108 & 483 & 2,274 & n/a & NetRepo & Protein
383 & bio-DM-LC & 658 & 2,258 & n/a & NetRepo & Biological Networks
384 & scc_rt_gmanews & 135 & 2,156 & n/a & NetRepo & Temporal Reachability
385 & biogrid-yeast & 836 & 2,098 & n/a & NetRepo & Biological Networks
386 & rt-twitter-copen & 761 & 2,058 & n/a & NetRepo & Retweet
387 & DD_g100 & 349 & 2,010 & n/a & NetRepo & Protein
388 & DD_g104 & 372 & 1,998 & n/a & NetRepo & Protein
389 & DD_g115 & 336 & 1,892 & n/a & NetRepo & Protein
390 & scc_rt_occupywallstnyc & 127 & 1,862 & n/a & NetRepo & Temporal Reachability
391 & soc-physicians & 241 & 1,846 & n/a & NetRepo & Social Networks
392 & ca-netscience & 379 & 1,828 & n/a & NetRepo & Collaboration
393 & eco-everglades & 69 & 1,765 & n/a & NetRepo & Ecology
394 & biogrid-plant & 523 & 1,676 & n/a & NetRepo & Biological Networks
395 & power-494-bus & 494 & 1,666 & n/a & NetRepo & Power Networks
396 & DD_g1021 & 329 & 1,574 & n/a & NetRepo & Protein
397 & DD_g11 & 312 & 1,522 & n/a & NetRepo & Protein
398 & ia-workplace-contacts & 92 & 1,510 & n/a & NetRepo & Interaction
399 & bn-cat-mixed-species-brain1 & 65 & 1,460 & n/a & NetRepo & Brain Networks
400 & DD_g1022 & 294 & 1,460 & n/a & NetRepo & Protein
401 & DD_g101 & 306 & 1,456 & n/a & NetRepo & Protein
402 & DD_g103 & 265 & 1,294 & n/a & NetRepo & Protein
403 & email-enron-only & 143 & 1,246 & n/a & NetRepo & Email
404 & bn-macaque-rhesus-brain2 & 91 & 1,164 & n/a & NetRepo & Brain Networks
405 & DD_g1006 & 246 & 1,136 & n/a & NetRepo & Protein
406 & Airports-Brazil & 131 & 1,074 & 4 & PyG & Flight
407 & scc_rt_justinbieber & 62 & 884 & n/a & NetRepo & Temporal Reachability
408 & DD_g1000 & 183 & 816 & n/a & NetRepo & Protein
409 & DD_g1017 & 162 & 752 & n/a & NetRepo & Protein
410 & scc_rt_alwefaq & 72 & 710 & n/a & NetRepo & Temporal Reachability
411 & DD_g1019 & 131 & 706 & n/a & NetRepo & Protein
412 & eco-stmarks & 54 & 703 & n/a & NetRepo & Ecology
413 & DD_g1030 & 136 & 702 & n/a & NetRepo & Protein
414 & DD_g10 & 146 & 656 & n/a & NetRepo & Protein
415 & internet-industry-partnerships & 219 & 631 & 3 & NetRepo & Collaboration
416 & soc-student-coop & 185 & 622 & n/a & NetRepo & Social Networks
417 & DD_g1016 & 113 & 582 & n/a & NetRepo & Protein
418 & soc-highschool-moreno & 70 & 548 & n/a & NetRepo & Social Networks
419 & DD_g1009 & 129 & 544 & n/a & NetRepo & Protein
420 & webkb-wisc & 265 & 530 & 5 & NetRepo & Web Graphs
421 & WebKB-Wisconsin & 251 & 515 & 5 & PyG & Web Graphs
422 & DD_g1015 & 102 & 488 & n/a & NetRepo & Protein
423 & DD_g1004 & 94 & 460 & n/a & NetRepo & Protein
424 & scc_rt_barackobama & 80 & 452 & n/a & NetRepo & Temporal Reachability
425 & DD_g1027 & 108 & 446 & n/a & NetRepo & Protein
426 & bn-mouse-visual-cortex2 & 193 & 428 & n/a & NetRepo & Brain Networks
427 & DD_g1025 & 88 & 410 & n/a & NetRepo & Protein
428 & WebKB-Texas & 183 & 325 & 5 & PyG & Web Graphs
429 & WebKB-Cornell & 183 & 298 & 5 & PyG & Web Graphs
430 & ENZYMES_g297 & 121 & 298 & n/a & NetRepo & Chemical
431 & ENZYMES_g296 & 125 & 282 & n/a & NetRepo & Chemical
432 & DD_g1028 & 72 & 274 & n/a & NetRepo & Protein
433 & ENZYMES_g118 & 95 & 242 & n/a & NetRepo & Chemical
434 & ENZYMES_g504 & 66 & 240 & n/a & NetRepo & Chemical
435 & DD_g1003 & 53 & 232 & n/a & NetRepo & Protein
436 & ENZYMES_g103 & 59 & 230 & n/a & NetRepo & Chemical
437 & ENZYMES_g594 & 52 & 228 & n/a & NetRepo & Chemical
438 & ENZYMES_g355 & 66 & 224 & n/a & NetRepo & Chemical
439 & NCI1_g3139 & 107 & 224 & n/a & NetRepo & Chemical
440 & NCI1_g1863 & 107 & 222 & n/a & NetRepo & Chemical
441 & ENZYMES_g575 & 51 & 220 & n/a & NetRepo & Chemical
442 & ENZYMES_g526 & 58 & 220 & n/a & NetRepo & Chemical
443 & ENZYMES_g199 & 62 & 216 & n/a & NetRepo & Chemical
444 & ENZYMES_g279 & 60 & 214 & n/a & NetRepo & Chemical
445 & NCI1_g3585 & 105 & 214 & n/a & NetRepo & Chemical
446 & ENZYMES_g527 & 57 & 214 & n/a & NetRepo & Chemical
447 & NCI1_g1677 & 102 & 212 & n/a & NetRepo & Chemical
448 & NCI1_g3711 & 89 & 212 & n/a & NetRepo & Chemical
449 & ENZYMES_g224 & 54 & 210 & n/a & NetRepo & Chemical
450 & NCI1_g3990 & 90 & 210 & n/a & NetRepo & Chemical
451 & ENZYMES_g291 & 62 & 208 & n/a & NetRepo & Chemical
452 & NCI1_g2079 & 88 & 206 & n/a & NetRepo & Chemical
453 & NCI1_g3444 & 93 & 204 & n/a & NetRepo & Chemical
454 & NCI1_g1893 & 96 & 204 & n/a & NetRepo & Chemical
455 & NCI1_g2082 & 86 & 202 & n/a & NetRepo & Chemical
456 & ENZYMES_g598 & 55 & 200 & n/a & NetRepo & Chemical
457 & KarateClub & 34 & 156 & 4 & PyG & Friendship

14 Hosting, Licensing, and Maintenance Plan↩︎

14.1 Hosting↩︎

The code and data of GLEMOS are hosted at https://github.com/facebookresearch/glemos.

14.2 Licensing↩︎

Data License. Among data included in GLEMOS, performance records [app:data:overview:perf], splitting of graph data [app:data:overview:graphsplits] and testbed data [app:data:overview:testbedsplits], and meta-graph features [app:data:overview:metafeats] were generated by GLEMOS by processing the graph data (). They are under the CC BY-NC 4.0 license. Graph data () are from two public graph repositories, i.e., Network Repository [59] and PyTorch Geometric [57]. Since all graph data used in this work are from these repositories, please refer to the corresponding repository for the license of graph datasets.

Code License. The GLEMOS codebase is under the CC BY-NC 4.0 license.

14.3 Maintenance↩︎

We will maintain and continue to develop GLEMOS for the long term. Specifically, we aim to improve and expand GLEMOS in two aspects, i.e., benchmark data and model selection algorithms.

Benchmark Data. We will monitor newly available graphs from various domains, as well as new graph learning (GL) models, and expand GLEMOS with those new graphs and GL models as follows.

  • Performance Records: We will evaluate new GL models on both new and existing graphs for applicable GL tasks, and evaluate existing GL models on the new graphs as well. Those new results will enrich our collection of performance records.

  • Graph Data Splits: Node and edge splits to evaluate GL models on the new graphs will be added.

  • Testbed Data Splits: Testbed data splits (e.g., over the performance matrix) used to evaluate model selection algorithms will be shared.

  • Meta-Graph Features: We will generate different sets of meta-graph features for the new graphs.

Model Selection Algorithms. As we discuss in the main text, there exist multiple directions for future work to improve the effectiveness and generalization capability of GL model selection algorithms. We will continue to monitor the latest advancement of this area, and expand GLEMOS with the state-of-the-art GL model selection algorithms.

15 Author Statement↩︎

We take all responsibilities of the benchmark data. In case of violation of any rights or data licenses, we will take necessary actions, such as revising the problematic data, or removing it from the benchmark.

References↩︎

[1]
Feng Xia, Ke Sun, Shuo Yu, Abdul Aziz, Liangtian Wan, Shirui Pan, and Huan Liu. Graph learning: A survey. IEEE Trans. Artif. Intell., 2(2):109–127, 2021.
[2]
Ziwei Zhang, Peng Cui, and Wenwu Zhu. Deep learning on graphs: A survey. IEEE Trans. Knowl. Data Eng., 34(1):249–270, 2022.
[3]
Wenqi Fan, Yao Ma, Qing Li, Yuan He, Yihong Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In WWW, pages 417–426. ACM, 2019.
[4]
Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, and Christos Faloutsos. : Inferring node importance in a knowledge graph from multiple input signals. In KDD, pages 503–512. ACM, 2020.
[5]
Weiwei Jiang and Jiayun Luo. Graph neural network for traffic forecasting: A survey. CoRR, abs/2101.11174, 2021.
[6]
Chang Su, Jie Tong, Yongjun Zhu, Peng Cui, and Fei Wang. Network embedding in biomedical data science. Briefings Bioinform., 21(1):182–197, 2020.
[7]
Junying Li, Deng Cai, and Xiaofei He. Learning graph-level representation for drug discovery. CoRR, abs/1709.03741, 2017.
[8]
Lei Cai, Zhengzhang Chen, Chen Luo, Jiaping Gui, Jingchao Ni, Ding Li, and Haifeng Chen. Structural temporal graph neural networks for anomaly detection in dynamic graphs. In CIKM, pages 3747–3756. ACM, 2021.
[9]
Namyong Park, Fuchen Liu, Purvanshi Mehta, Dana Cristofor, Christos Faloutsos, and Yuxiao Dong. : Jointly modeling event time and network structure for reasoning over temporal knowledge graphs. In WSDM, pages 794–803. ACM, 2022.
[10]
Namyong Park, Ryan A. Rossi, Eunyee Koh, Iftikhar Ahamath Burhanuddin, Sungchul Kim, Fan Du, Nesreen K. Ahmed, and Christos Faloutsos. : Contrastive graph clustering for community detection and tracking. In WWW, pages 1115–1126. ACM, 2022.
[11]
David H. Wolpert and William G. Macready. No free lunch theorems for optimization. IEEE Trans. Evol. Comput., 1(1):67–82, 1997.
[12]
Jiaxuan You, Zhitao Ying, and Jure Leskovec. Design space for graph neural networks. In NeurIPS, 2020.
[13]
Namyong Park, Ryan A. Rossi, Nesreen Ahmed, and Christos Faloutsos. MetaGL: Evaluation-free selection of graph learning models via meta-learning. In The Eleventh International Conference on Learning Representations, 2023.
[14]
Kaidi Cao, Jiaxuan You, Jiaju Liu, and Jure Leskovec. Autotransfer: Automl with knowledge transfer - an application to graph neural networks. In ICLR. OpenReview.net, 2023.
[15]
Yijian Qin, Ziwei Zhang, Xin Wang, Zeyang Zhang, and Wenwu Zhu. Nas-bench-graph: Benchmarking graph neural architecture search. In NeurIPS, 2022.
[16]
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13:281–305, 2012.
[17]
Petro Liashchynskyi and Pavlo Liashchynskyi. Grid search, random search, genetic algorithm: A big comparison for NAS. CoRR, abs/1912.06059, 2019.
[18]
Lisha Li, Kevin G. Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res., 18:185:1–185:52, 2017.
[19]
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. Google vizier: A service for black-box optimization. In KDD, pages 1487–1495. ACM, 2017.
[20]
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine learning algorithms. In NIPS, pages 2960–2968, 2012.
[21]
Jia Wu, Xiu-Yun Chen, Hao Zhang, Li-Dong Xiong, Hang Lei, and Si-Hao Deng. Hyperparameter optimization for machine learning models based on bayesian optimization. Journal of Electronic Science and Technology, 17(1):26–40, 2019.
[22]
Stefan Falkner, Aaron Klein, and Frank Hutter. robust and efficient hyperparameter optimization at scale. In ICML, volume 80 of Proceedings of Machine Learning Research, pages 1436–1445. PMLR, 2018.
[23]
Yang Gao, Hong Yang, Peng Zhang, Chuan Zhou, and Yue Hu. Graph neural architecture search. In IJCAI, pages 1403–1409, 2020.
[24]
Kaixiong Zhou, Qingquan Song, Xiao Huang, and Xia Hu. : Neural architecture search of graph neural networks. CoRR, abs/1909.03184, 2019.
[25]
Kwei-Herng Lai, Daochen Zha, Kaixiong Zhou, and Xia Hu. Policy-gnn: Aggregation optimization for graph neural networks. In KDD, pages 461–471. ACM, 2020.
[26]
Chenyang Bu, Yi Lu, and Fei Liu. Automatic graph learning with evolutionary algorithms: An experimental study. In PRICAI (1), volume 13031 of Lecture Notes in Computer Science, pages 513–526. Springer, 2021.
[27]
Ke Tu, Jianxin Ma, Peng Cui, Jian Pei, and Wenwu Zhu. : Hyperparameter optimization for massive network embedding. In KDD, pages 216–225. ACM, 2019.
[28]
Ronghang Zhu, Zhiqiang Tao, Yaliang Li, and Sheng Li. Automated graph learning via population based self-tuning GCN. In SIGIR, pages 2096–2100. ACM, 2021.
[29]
Mengying Guo, Tao Yi, Yuqing Zhu, and Yungang Bao. : Just-in-time hyperparameter tuning for network embedding algorithms. CoRR, abs/2101.06427, 2021.
[30]
Salisu Mamman Abdulrahman, Pavel Brazdil, Jan N. van Rijn, and Joaquin Vanschoren. Speeding up algorithm selection using average ranking and active testing by introducing runtime. Mach. Learn., 107(1):79–108, 2018.
[31]
Mladen Nikolić, Filip Marić, and Predrag Janičić. Simple algorithm portfolio for sat. Artificial Intelligence Review, 40(4):457–465, 2013.
[32]
Serdar Kadioglu, Yuri Malitsky, Meinolf Sellmann, and Kevin Tierney. - instance-specific algorithm configuration. In ECAI, volume 215 of Frontiers in Artificial Intelligence and Applications, pages 751–756. IOS Press, 2010.
[33]
Lin Xu, Frank Hutter, Jonathan Shen, Holger H Hoos, and Kevin Leyton-Brown. Satzilla2012: Improved algorithm selection based on cost-sensitive classification models. Proceedings of SAT Challenge, pages 57–58, 2012.
[34]
Yue Zhao, Ryan A. Rossi, and Leman Akoglu. Automatic unsupervised outlier model selection. In NeurIPS, pages 4489–4502, 2021.
[35]
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR (Poster). OpenReview.net, 2017.
[36]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In ICLR (Poster). OpenReview.net, 2018.
[37]
Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. Simplifying graph convolutional networks. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 6861–6871. PMLR, 2019.
[38]
Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In KDD, pages 855–864. ACM, 2016.
[39]
Petar Velickovic, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R. Devon Hjelm. Deep graph infomax. In ICLR (Poster). OpenReview.net, 2019.
[40]
Bin Luo, Richard C. Wilson, and Edwin R. Hancock. Spectral embedding of graphs. Pattern Recognit., 36(10):2213–2230, 2003.
[41]
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical report, Carnegie Mellon University, 2002.
[42]
Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. In NeurIPS, pages 5171–5181, 2018.
[43]
David Liben-Nowell and Jon M. Kleinberg. The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol., 58(7):1019–1031, 2007.
[44]
Yining Wang, Liwei Wang, Yuanzhi Li, Di He, and Tie-Yan Liu. A theoretical analysis of NDCG type ranking measures. In COLT, volume 30 of JMLR Workshop and Conference Proceedings, pages 25–54. JMLR.org, 2013.
[45]
William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1024–1034, 2017.
[46]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In ICLR. OpenReview.net, 2019.
[47]
Shyam A. Tailor, Felix L. Opolka, Pietro Liò, and Nicholas Donald Lane. Do we need anisotropic graph neural networks? In ICLR. OpenReview.net, 2022.
[48]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, pages 3837–3845, 2016.
[49]
Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar Velickovic. Principal neighbourhood aggregation for graph nets. In NeurIPS, 2020.
[50]
Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891–900. ACM, 2015.
[51]
Tao Zhou, Linyuan Lü, and Yi-Cheng Zhang. Predicting missing links via local information. The European Physical Journal B, 71:623–630, 2009.
[52]
Mustafa Misir and Michèle Sebag. Alors: An algorithm recommender system. Artif. Intell., 244:291–314, 2017.
[53]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. Neural collaborative filtering. In WWW, pages 173–182. ACM, 2017.
[54]
Nesreen K Ahmed, Theodore L Willke, and Ryan A Rossi. Estimation of local subgraph counts. In 2016 IEEE International Conference on Big Data (Big Data), pages 586–595. IEEE, 2016.
[55]
Luca Cappelletti, Tommaso Fontana, Elena Casiraghi, Vida Ravanmehr, Tiffany J. Callahan, Marcin P. Joachimiak, Christopher J. Mungall, Peter N. Robinson, Justin Reese, and Giorgio Valentini. Grape: fast and scalable graph processing and embedding, 2021.
[56]
Aric Hagberg, Pieter Swart, and Daniel S Chult. Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Lab.(LANL), Los Alamos, NM (United States), 2008.
[57]
Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. CoRR, abs/1903.02428, 2019.
[58]
Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. Heterogeneous graph transformer. In WWW, pages 2704–2710. ACM / IW3C2, 2020.
[59]
Ryan A. Rossi and Nesreen K. Ahmed. The network data repository with interactive graph analytics and visualization. In AAAI, pages 4292–4293. AAAI Press, 2015.

  1. Correspondence: namyongp@meta.com↩︎

  2. https://numpy.org/↩︎

  3. https://scipy.org/↩︎

  4. http://dgl.ai/↩︎