SCAN#
- class pygod.detector.SCAN(eps=0.5, mu=2, contamination=0.1, verbose=0)[source]#
Bases:
DetectorStructural Clustering Algorithm for Networks
SCAN is a clustering algorithm, which only takes the graph structure without the node features as the input. Note: This model will output detected clusters instead of “outliers” descibed in the original paper.
Note
This detector is transductive only. Using
predictwith unseen data will train the detector from scratch.See [XYFS07] for details.
- Parameters:
eps (float, optional) – Neighborhood threshold. Default:
.5.mu (int, optional) – Minimal size of clusters. Default:
2.contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default:
0.1.verbose (int, optional) – Verbosity mode. Range in [0, 3]. Larger value for printing out more log information. Default:
0.
- decision_score_#
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- Type:
- threshold_#
The threshold is based on
contamination. It is the \(N \times\)contaminationmost abnormal samples indecision_score_. The threshold is calculated for generating binary outlier labels.- Type:
- label_#
The binary labels of the training data. 0 stands for inliers and 1 for outliers. It is generated by applying
threshold_ondecision_score_.- Type:
- hub_score_#
The binary hub scores of each node.
- Type:
- scatter_score_#
The binary scatter scores of each node, i.e., the “outlier” scores in the original paper.
- Type:
- fit(data, label=None)[source]#
Fit detector with training data.
- Parameters:
data (torch_geometric.data.Data) – The training graph.
label (torch.Tensor, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default:
None.
- Returns:
self – Fitted detector.
- Return type:
- predict(data=None, label=None, return_pred=True, return_score=False, return_prob=False, prob_method='linear', return_conf=False)#
Prediction for testing data using the fitted detector. Return predicted labels by default.
- Parameters:
data (torch_geometric.data.Data, optional) – The testing graph. If
None, the training data is used. Default:None.label (torch.Tensor, optional) – The optional outlier ground truth labels used for testing. Default:
None.return_pred (bool, optional) – Whether to return the predicted binary labels. The labels are determined by the outlier contamination on the raw outlier scores. Default:
True.return_score (bool, optional) – Whether to return the raw outlier scores. Default:
False.return_prob (bool, optional) – Whether to return the outlier probabilities. Default:
False.prob_method (str, optional) –
The method to convert the outlier scores to probabilities. Two approaches are possible:
1.
'linear': simply use min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.2.
'unify': use unifying scores, see [KKSZ11].Default:
'linear'.return_conf (boolean, optional) – Whether to return the model’s confidence in making the same prediction under slightly different training sets. See [PVD20]. Default:
False.
- Returns:
pred (torch.Tensor) – The predicted binary outlier labels of shape \(N\). 0 stands for inliers and 1 for outliers. Only available when
return_label=True.score (torch.Tensor) – The raw outlier scores of shape \(N\). Only available when
return_score=True.prob (torch.Tensor) – The outlier probabilities of shape \(N\). Only available when
return_prob=True.conf (torch.Tensor) – The prediction confidence of shape \(N\). Only available when
return_conf=True.