CoLA¶

class pygod.detector.CoLA(hid_dim=64, num_layers=4, dropout=0.0, weight_decay=0.0, act=<function relu>, backbone=<class 'torch_geometric.nn.models.basic_gnn.GCN'>, contamination=0.1, lr=0.004, epoch=100, gpu=-1, batch_size=0, num_neigh=-1, verbose=0, save_emb=False, compile_model=False, **kwargs)[source]¶

Bases: DeepDetector

Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning

CoLA is a contrastive self-supervised learning based method for graph anomaly detection. This implementation is base on random neighbor sampling instead of random walk sampling in the original paper.

See [LLP+21] for details.

Parameters:

hid_dim (int, optional) – Hidden dimension of model. Default: 64.
num_layers (int, optional) – Total number of layers in model. Default: 4.
dropout (float, optional) – Dropout rate. Default: 0..
weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..
act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.
backbone (torch.nn.Module) – The backbone of the deep detector implemented in PyG. Default: torch_geometric.nn.GCN.
contamination (float, optional) – The amount of contamination of the dataset in (0., 0.5], i.e., the proportion of outliers in the dataset. Used when fitting to define the threshold on the decision function. Default: 0.1.
lr (float, optional) – Learning rate. Default: 0.004.
epoch (int, optional) – Maximum number of training epoch. Default: 100.
gpu (int) – GPU Index, -1 for using CPU. Default: -1.
batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.
num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.
verbose (int, optional) – Verbosity mode. Range in [0, 3]. Larger value for printing out more log information. Default: 0.
save_emb (bool, optional) – Whether to save the embedding. Default: False.
compile_model (bool, optional) – Whether to compile the model with torch_geometric.compile. Default: False.
**kwargs – Other parameters for the backbone.

decision_score_¶

The outlier scores of the training data. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:: torch.Tensor

threshold_¶

The threshold is based on contamination. It is the \(N \times\) contamination most abnormal samples in decision_score_. The threshold is calculated for generating binary outlier labels.

Type:: float

label_¶

The binary labels of the training data. 0 stands for inliers and 1 for outliers. It is generated by applying threshold_ on decision_score_.

Type:: torch.Tensor

emb¶

The learned node hidden embeddings of shape \(N \times\) hid_dim. Only available when save_emb is True. When the detector has not been fitted, emb is None. When the detector has multiple embeddings, emb is a tuple of torch.Tensor.

Type:: torch.Tensor or tuple of torch.Tensor or None

fit(data, label=None)¶

Fit detector with training data.

Parameters:

data (torch_geometric.data.Data) – The training graph.
label (torch.Tensor, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns:

self – Fitted detector.

Return type:

object

predict(data=None, label=None, return_pred=True, return_score=False, return_prob=False, prob_method='linear', return_conf=False, return_emb=False)¶

Prediction for testing data using the fitted detector. Return predicted labels by default.

Parameters:

data (torch_geometric.data.Data, optional) – The testing graph. If None, the training data is used. Default: None.
label (torch.Tensor, optional) – The optional outlier ground truth labels used for testing. Default: None.
return_pred (bool, optional) – Whether to return the predicted binary labels. The labels are determined by the outlier contamination on the raw outlier scores. Default: True.
return_score (bool, optional) – Whether to return the raw outlier scores. Default: False.
return_prob (bool, optional) – Whether to return the outlier probabilities. Default: False.
prob_method (str, optional) –
The method to convert the outlier scores to probabilities. Two approaches are possible:

1. 'linear': simply use min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

2. 'unify': use unifying scores, see [KKSZ11].

Default: 'linear'.
return_conf (boolean, optional) – Whether to return the model’s confidence in making the same prediction under slightly different training sets. See [PVD20]. Default: False.
return_emb (bool, optional) – Whether to return the learned node representations. Default: False.

Returns:

pred (torch.Tensor) – The predicted binary outlier labels of shape \(N\). 0 stands for inliers and 1 for outliers. Only available when return_label=True.
score (torch.Tensor) – The raw outlier scores of shape \(N\). Only available when return_score=True.
prob (torch.Tensor) – The outlier probabilities of shape \(N\). Only available when return_prob=True.
conf (torch.Tensor) – The prediction confidence of shape \(N\). Only available when return_conf=True.