Model Example#

In this introductory tutorial, you will learn the basic workflow of PyGOD with a model example of DOMINANT. This tutorial assumes that you have basic familiarity with PyTorch and PyTorch Geometric (PyG).

(Time estimate: 5 minutes)

Data Loading#

PyGOD use torch_geometric.data.Data to handle the data. Here, we use Cora, a PyG built-in dataset, as an example. To load your own dataset into PyGOD, you can refer to [creating your own datasets tutorial](https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html) in PyG.

import torch_geometric.transforms as T
from torch_geometric.datasets import Planetoid

data = Planetoid('./data/Cora', 'Cora', transform=T.NormalizeFeatures())[0]

Out:

Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!

Because there is no ground truth label of outliers in Cora, we follow the method used by DOMINANT to inject 100 attribute outliers and 100 structure outliers into the graph. Note: If your dataset already contains the outliers you want to detect, you don’t need to inject more outliers.

import torch
from pygod.utils import gen_attribute_outliers, gen_structure_outliers

data, ya = gen_attribute_outliers(data, n=100, k=50)
data, ys = gen_structure_outliers(data, m=10, n=10)
data.y = torch.logical_or(ys, ya).int()

Initialization#

You can use any model by simply initializing without passing any arguments. Default hyperparameters are ready for you. Of course, you can also customize the parameters by passing arguments. Here, we use pygod.models.DOMINANT as an example.

from pygod.models import DOMINANT

model = DOMINANT()

Out:

/home/docs/checkouts/readthedocs.org/user_builds/py-god/checkouts/v0.2.0/pygod/utils/utility.py:49: UserWarning: The cuda is not available. Set to cpu.
  warnings.warn('The cuda is not available. Set to cpu.')

Training#

To train the model with the loaded data, simply feed the torch_geometric.data.Data object into the model via method fit.

model.fit(data)

Out:

DOMINANT(act=<function relu at 0x7fc9f608e700>, alpha=0.8, batch_size=2708,
     contamination=0.1, dropout=0.3, epoch=5, gpu=None, hid_dim=64,
     lr=0.005, num_layers=4, num_neigh=-1, verbose=False, weight_decay=0.0)

Inference#

Then, your model is ready to use. We provide several inference methods.

To predict the labels only:

labels = model.predict(data)
print('Labels:')
print(labels)

Out:

Labels:
[0 0 0 ... 0 0 0]

To predict raw outlier scores:

outlier_scores = model.decision_function(data)
print('Raw scores:')
print(outlier_scores)

Out:

Raw scores:
[0.61956072 0.52426875 0.63175178 ... 0.39904782 0.62312496 0.63099992]

To predict the probability of the outlierness:

prob = model.predict_proba(data)
print('Probability:')
print(prob)

Out:

Probability:
[[0.89596552 0.10403448]
 [0.93588485 0.06411515]
 [0.89085849 0.10914151]
 ...
 [0.9883419  0.0116581 ]
 [0.8944724  0.1055276 ]
 [0.89117346 0.10882654]]

To predict the labels with confidence:

labels, confidence = model.predict(data, return_confidence=True)
print('Labels:')
print(labels)
print('Confidence:')
print(confidence)

Out:

Labels:
[0 0 0 ... 0 0 0]
Confidence:
[1. 1. 1. ... 1. 1. 1.]

To evaluate the performance outlier detector:

from pygod.utils.metric import \
    eval_roc_auc, \
    eval_recall_at_k, \
    eval_precision_at_k

k = 200

auc_score = eval_roc_auc(data.y.numpy(), outlier_scores)
recall_at_k = eval_recall_at_k(data.y.numpy(), outlier_scores,
                               k=k, threshold=model.threshold_)
precision_at_k = eval_precision_at_k(data.y.numpy(), outlier_scores,
                                     k=k, threshold=model.threshold_)

print('AUC Score:', auc_score)
print(f'Recall@{k}:', recall_at_k)
print(f'Precision@{k}:', precision_at_k)

Out:

AUC Score: 0.9387470752632263
Recall@200: 0.5102040816326531
Precision@200: 0.5

Total running time of the script: ( 0 minutes 11.296 seconds)

Gallery generated by Sphinx-Gallery