API CheatSheet#
The following APIs are applicable for all detectors for easy use.
pygod.detector.Detector.fit(): Fit the detector with train data.pygod.detector.Detector.predict(): Predict on test data (train data if not provided) using the fitted detector.
Key Attributes of a fitted detector:
pygod.detector.Detector.decision_score_: The outlier scores of the input data. Outliers tend to have higher scores.pygod.detector.Detector.label_: The binary labels of the input data. 0 stands for inliers and 1 for outliers.threshold_: The determined threshold for binary classification. Scores above the threshold are outliers.
Input of PyGOD: Please pass in a PyG Data object. See PyG data processing examples.
Base Detector#
Detector is the abstract class for all detectors:
- class pygod.detector.Detector(contamination=0.1, verbose=0)[source]#
Bases:
ABCAbstract class for all outlier detection algorithms.
- Parameters:
contamination (float, optional) – The amount of contamination of the dataset in (0., 0.5], i.e., the proportion of outliers in the dataset. Used when fitting to define the threshold on the decision function. Default:
0.1.verbose (int, optional) – Verbosity mode. Range in [0, 3]. Larger value for printing out more log information. Default:
0.
- decision_score_#
The outlier scores of the training data. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
- threshold_#
The threshold is based on
contamination. It is the \(N`*``contamination`\) most abnormal samples indecision_score_. The threshold is calculated for generating binary outlier labels.- Type:
- label_#
The binary labels of the training data. 0 stands for inliers and 1 for outliers. It is generated by applying
threshold_ondecision_score_.- Type:
- abstract decision_function(data, label=None)[source]#
Predict raw outlier scores of testing data using the fitted detector. Outliers are assigned with higher outlier scores.
- Parameters:
data (torch_geometric.data.Data) – The testing graph.
label (torch.Tensor, optional) – The optional outlier ground truth labels used for testing. Default:
None.
- Returns:
score – The outlier scores of shape \(N\).
- Return type:
- abstract fit(data, label=None)[source]#
Fit detector with training data.
- Parameters:
data (torch_geometric.data.Data) – The training graph.
label (torch.Tensor, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default:
None.
- Returns:
self – Fitted detector.
- Return type:
- predict(data=None, label=None, return_pred=True, return_score=False, return_prob=False, prob_method='linear', return_conf=False)[source]#
Prediction for testing data using the fitted detector. Return predicted labels by default.
- Parameters:
data (torch_geometric.data.Data, optional) – The testing graph. If
None, the training data is used. Default:None.label (torch.Tensor, optional) – The optional outlier ground truth labels used for testing. Default:
None.return_pred (bool, optional) – Whether to return the predicted binary labels. The labels are determined by the outlier contamination on the raw outlier scores. Default:
True.return_score (bool, optional) – Whether to return the raw outlier scores. Default:
False.return_prob (bool, optional) – Whether to return the outlier probabilities. Default:
False.prob_method (str, optional) –
The method to convert the outlier scores to probabilities. Two approaches are possible:
1.
'linear': simply use min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.2.
'unify': use unifying scores, see [KKSZ11].Default:
'linear'.return_conf (boolean, optional) – Whether to return the model’s confidence in making the same prediction under slightly different training sets. See [PVD20]. Default:
False.
- Returns:
pred (torch.Tensor) – The predicted binary outlier labels of shape \(N\). 0 stands for inliers and 1 for outliers. Only available when
return_label=True.score (torch.Tensor) – The raw outlier scores of shape \(N\). Only available when
return_score=True.prob (torch.Tensor) – The outlier probabilities of shape \(N\). Only available when
return_prob=True.conf (torch.Tensor) – The prediction confidence of shape \(N\). Only available when
return_conf=True.
- abstract process_graph(data)[source]#
Data preprocessing for the input graph.
- Parameters:
data (torch_geometric.data.Data) – The input graph.
Deep Detector#
By inherit Detector class, we also provide base deep detector class for deep learning based detectors to ease the implementation.
- class pygod.detector.DeepDetector(hid_dim=64, num_layers=2, dropout=0.0, weight_decay=0.0, act=<function relu>, backbone=<class 'torch_geometric.nn.models.basic_gnn.GIN'>, contamination=0.1, lr=0.004, epoch=100, gpu=-1, batch_size=0, num_neigh=-1, verbose=0, gan=False, save_emb=False, compile_model=False, **kwargs)[source]#
Abstract class for deep outlier detection algorithms.
- Parameters:
hid_dim (int, optional) – Hidden dimension of model. Default:
64.num_layers (int, optional) – Total number of layers in model. Default:
2.dropout (float, optional) – Dropout rate. Default:
0..weight_decay (float, optional) – Weight decay (L2 penalty). Default:
0..act (callable activation function or None, optional) – Activation function if not None. Default:
torch.nn.functional.relu.backbone (torch.nn.Module) – The backbone of the deep detector implemented in PyG. Default:
torch_geometric.nn.GIN.contamination (float, optional) – The amount of contamination of the dataset in (0., 0.5], i.e., the proportion of outliers in the dataset. Used when fitting to define the threshold on the decision function. Default:
0.1.lr (float, optional) – Learning rate. Default:
0.004.epoch (int, optional) – Maximum number of training epoch. Default:
100.gpu (int) – GPU Index, -1 for using CPU. Default:
-1.batch_size (int, optional) – Minibatch size, 0 for full batch training. Default:
0.num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default:
-1.gan (bool, optional) – Whether using adversarial training. Default:
False.verbose (int, optional) – Verbosity mode. Range in [0, 3]. Larger value for printing out more log information. Default:
0.save_emb (bool, optional) – Whether to save the embedding. Default:
False.compile_model (bool, optional) – Whether to compile the model with
torch_geometric.compile. Default:False.**kwargs – Other parameters for the backbone.
- decision_score_#
The outlier scores of the training data. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
- threshold_#
The threshold is based on
contamination. It is the \(N`*``contamination`\) most abnormal samples indecision_score_. The threshold is calculated for generating binary outlier labels.- Type:
- label_#
The binary labels of the training data. 0 stands for inliers and 1 for outliers. It is generated by applying
threshold_ondecision_score_.- Type:
- emb#
The learned node hidden embeddings of shape \(N \times\)
hid_dim. Only available whensave_embisTrue. When the detector has not been fitted,embisNone. When the detector has multiple embeddings,embis a tuple of torch.Tensor.- Type:
torch.Tensor or tuple of torch.Tensor or None
- abstract forward_model(data)[source]#
Forward pass of the neural network detector.
- Parameters:
data (torch_geometric.data.Data) – The input graph.
- Returns:
loss (torch.Tensor) – The loss of the current batch.
score (torch.Tensor) – The outlier scores of the current batch.
- abstract init_model()[source]#
Initialize the neural network detector.
- Returns:
model – The initialized neural network detector.
- Return type:
- abstract process_graph(data)#
Data preprocessing for the input graph.
- Parameters:
data (torch_geometric.data.Data) – The input graph.