SCAN

class pygod.detector.SCAN(eps=0.5, mu=2, contamination=0.1, verbose=0)[source]

Bases: Detector

Structural Clustering Algorithm for Networks

SCAN is a clustering algorithm, which only takes the graph structure without the node features as the input. Note: This model will output detected clusters instead of “outliers” descibed in the original paper.

Note

This detector is transductive only. Using predict with unseen data will train the detector from scratch.

See [XYFS07] for details.

Parameters:
  • eps (float, optional) – Neighborhood threshold. Default: .5.

  • mu (int, optional) – Minimal size of clusters. Default: 2.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • verbose (int, optional) – Verbosity mode. Range in [0, 3]. Larger value for printing out more log information. Default: 0.

decision_score_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:

torch.Tensor

threshold_

The threshold is based on contamination. It is the \(N \times\) contamination most abnormal samples in decision_score_. The threshold is calculated for generating binary outlier labels.

Type:

float

label_

The binary labels of the training data. 0 stands for inliers and 1 for outliers. It is generated by applying threshold_ on decision_score_.

Type:

torch.Tensor

hub_score_

The binary hub scores of each node.

Type:

torch.Tensor

scatter_score_

The binary scatter scores of each node, i.e., the “outlier” scores in the original paper.

Type:

torch.Tensor

fit(data, label=None)[source]

Fit detector with training data.

Parameters:
  • data (torch_geometric.data.Data) – The training graph.

  • label (torch.Tensor, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns:

self – Fitted detector.

Return type:

object

predict(data=None, label=None, return_pred=True, return_score=False, return_prob=False, prob_method='linear', return_conf=False)

Prediction for testing data using the fitted detector. Return predicted labels by default.

Parameters:
  • data (torch_geometric.data.Data, optional) – The testing graph. If None, the training data is used. Default: None.

  • label (torch.Tensor, optional) – The optional outlier ground truth labels used for testing. Default: None.

  • return_pred (bool, optional) – Whether to return the predicted binary labels. The labels are determined by the outlier contamination on the raw outlier scores. Default: True.

  • return_score (bool, optional) – Whether to return the raw outlier scores. Default: False.

  • return_prob (bool, optional) – Whether to return the outlier probabilities. Default: False.

  • prob_method (str, optional) –

    The method to convert the outlier scores to probabilities. Two approaches are possible:

    1. 'linear': simply use min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

    2. 'unify': use unifying scores, see [KKSZ11].

    Default: 'linear'.

  • return_conf (boolean, optional) – Whether to return the model’s confidence in making the same prediction under slightly different training sets. See [PVD20]. Default: False.

Returns:

  • pred (torch.Tensor) – The predicted binary outlier labels of shape \(N\). 0 stands for inliers and 1 for outliers. Only available when return_label=True.

  • score (torch.Tensor) – The raw outlier scores of shape \(N\). Only available when return_score=True.

  • prob (torch.Tensor) – The outlier probabilities of shape \(N\). Only available when return_prob=True.

  • conf (torch.Tensor) – The prediction confidence of shape \(N\). Only available when return_conf=True.