In this tutorial, you will learn the basic workflow of PyGOD with an example of DOMINANT. This tutorial assumes that you have basic familiarity with PyTorch and PyTorch Geometric (PyG).
(Time estimate: 5 minutes)
torch_geometric.data.Data to handle the data. Here, we
use Cora, a PyG built-in dataset, as an example. To load your own
dataset into PyGOD, you can refer to creating your own datasets
tutorial in PyG.
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index Processing... Done!
Because there is no ground truth label of outliers in Cora, we follow the method used by DOMINANT to inject 100 contextual outliers and 100 structure outliers into the graph. Note: If your dataset already contains the outliers you want to detect, you don’t have to inject more outliers.
We also provide various type of built-in datasets. You can load them
by passing the name of the dataset to
See data repository
for more details.
You can use any detector by simply initializing without passing any
arguments. Default hyperparameters are ready for you. Of course, you
can also customize the parameters by passing arguments. Here, we use
pygod.detector.DOMINANT as an example.
To train the detector with the loaded data, simply feed the
torch_geometric.data.Data object into the detector via
DOMINANT(act=<function relu at 0x7f82b8032940>, backbone=<class 'torch_geometric.nn.models.basic_gnn.GCN'>, batch_size=2708, compile_model=False, contamination=0.1, dropout=0.0, epoch=100, gpu=None, hid_dim=64, lr=0.004, num_layers=4, num_neigh=[-1, -1, -1, -1], save_emb=False, sigmoid_s=False, verbose=0, weight=0.5, weight_decay=0.0)
After training, the detector is ready to use. You can use the detector to predict the labels, raw outlier scores, probability of the outlierness, and prediction confidence. Here, we use the loaded data as an example.
Labels: tensor([0, 0, 0, ..., 0, 0, 0]) Raw scores: tensor([1.0302, 0.9661, 1.2206, ..., 0.6130, 1.1309, 1.1342]) Probability: tensor([0.0749, 0.0640, 0.1073, ..., 0.0039, 0.0920, 0.0926]) Confidence: tensor([1., 1., 1., ..., 1., 1., 1.])
To evaluate the performance outlier detector with AUC score, you can:
AUC Score: 0.7675294648395646
Total running time of the script: (0 minutes 45.105 seconds)