Note
Go to the end to download the full example code.
Detector Example¶
In this tutorial, you will learn the basic workflow of PyGOD with an example of DOMINANT. This tutorial assumes that you have basic familiarity with PyTorch and PyTorch Geometric (PyG).
(Time estimate: 5 minutes)
Data Loading¶
PyGOD use torch_geometric.data.Data
to handle the data. Here, we
use Cora, a PyG built-in dataset, as an example. To load your own
dataset into PyGOD, you can refer to creating your own datasets
tutorial in PyG.
import torch_geometric.transforms as T
from torch_geometric.datasets import Planetoid
data = Planetoid('./data/Cora', 'Cora', transform=T.NormalizeFeatures())[0]
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.x
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.tx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.allx
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.y
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ty
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.ally
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.graph
Downloading https://github.com/kimiyoung/planetoid/raw/master/data/ind.cora.test.index
Processing...
Done!
Because there is no ground truth label of outliers in Cora, we follow the method used by DOMINANT to inject 100 contextual outliers and 100 structure outliers into the graph. Note: If your dataset already contains the outliers you want to detect, you don’t have to inject more outliers.
We also provide various type of built-in datasets. You can load them
by passing the name of the dataset to load_data
function.
See data repository
for more details.
Initialization¶
You can use any detector by simply initializing without passing any
arguments. Default hyperparameters are ready for you. Of course, you
can also customize the parameters by passing arguments. Here, we use
pygod.detector.DOMINANT
as an example.
Training¶
To train the detector with the loaded data, simply feed the
torch_geometric.data.Data
object into the detector via fit
.
detector.fit(data)
DOMINANT(act=<function relu at 0x7fe4933e8220>,
backbone=<class 'torch_geometric.nn.models.basic_gnn.GCN'>,
batch_size=2708, compile_model=False, contamination=0.1,
dropout=0.0, epoch=100, gpu=None, hid_dim=64, lr=0.004,
num_layers=4, num_neigh=[-1, -1, -1, -1], save_emb=False,
sigmoid_s=False, verbose=0, weight=0.5, weight_decay=0.0)
Inference¶
After training, the detector is ready to use. You can use the detector to predict the labels, raw outlier scores, probability of the outlierness, and prediction confidence. Here, we use the loaded data as an example.
pred, score, prob, conf = detector.predict(data,
return_pred=True,
return_score=True,
return_prob=True,
return_conf=True)
print('Labels:')
print(pred)
print('Raw scores:')
print(score)
print('Probability:')
print(prob)
print('Confidence:')
print(conf)
Labels:
tensor([0, 0, 0, ..., 0, 0, 0])
Raw scores:
tensor([1.0254, 0.9655, 1.2170, ..., 0.6176, 1.1248, 1.1259])
Probability:
tensor([0.0835, 0.0724, 0.1191, ..., 0.0078, 0.1020, 0.1022])
Confidence:
tensor([1., 1., 1., ..., 1., 1., 1.])
Evaluation¶
To evaluate the performance outlier detector with AUC score, you can:
AUC Score: 0.7676196920994756
Total running time of the script: (0 minutes 43.706 seconds)