API CheatSheet#

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model:

For the inductive setting:

Input of PyGOD: Please pass in a PyTorch Geometric (PyG) data object. See PyG data processing examples.

See base class definition below:

pygod.models.base module#

Base classes for all outlier detector

class pygod.models.base.BaseDetector(contamination=0.1)[source]#

Bases: object

Abstract class for all outlier detection algorithms.

Parameters:

contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:

numpy array of shape (n_samples,)

threshold_#

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type:

float

labels_#

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type:

int, either 0 or 1

decision_function(G)[source]#

Predict raw outlier scores of PyG Graph G using the fitted detector. The outlier score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

outlier_scores – The outlier score of shape N.

Return type:

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector.

Parameters:
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns:

self – Fitted estimator.

Return type:

object

get_params(deep=True)[source]#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters:

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

predict(G, return_confidence=False)[source]#

Predict if a particular sample is an outlier or not.

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)[source]#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type:

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)[source]#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters:
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns:

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type:

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the underlying model. For instance, if the training of the model need the node feature and edge index, return (G.x, G.edge_index).

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

processed_data – The necessary information from the raw PyG Data object.

Return type:

tuple of data object

set_params(**params)[source]#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns:

self

Return type:

object

class pygod.models.base.DeepDetector(in_dim=None, hid_dim=64, num_layers=4, dropout=0.3, weight_decay=0.0, act=<built-in method relu of type object>, contamination=0.1, lr=0.005, epoch=5, gpu=-1, batch_size=0, num_neigh=-1, scalable=False, verbose=False, **kwargs)[source]#

Bases: BaseDetector, ABC

Abstract class for deep outlier detection algorithms.

Parameters:
  • TODO (update the docstring) –

  • hid_dim (int, optional) – Hidden dimension of model. Default: 0.

  • num_layers (int, optional) – Total number of layers in model. A half (floor) of the layers are for the encoder, the other half (ceil) of the layers are for decoders. Default: 4.

  • dropout (float, optional) – Dropout rate. Default: 0..

  • weight_decay (float, optional) – Weight decay (L2 penalty). Default: 0..

  • act (callable activation function or None, optional) – Activation function if not None. Default: torch.nn.functional.relu.

  • contamination (float, optional) – Valid in (0., 0.5). The proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. Default: 0.1.

  • lr (float, optional) – Learning rate. Default: 0.004.

  • epoch (int, optional) – Maximum number of training epoch. Default: 5.

  • gpu (int) – GPU Index, -1 for using CPU. Default: 0.

  • batch_size (int, optional) – Minibatch size, 0 for full batch training. Default: 0.

  • num_neigh (int, optional) – Number of neighbors in sampling, -1 for all neighbors. Default: -1.

  • scalable (bool, optional) – Whether using the scalable version of the model TODO: add more info about the scalable version Default: False.

  • verbose (bool) – Verbosity mode. Turn on to print out log information. Default: False.

decision_function(G)[source]#

Predict raw anomaly score using the fitted detector. Outliers are assigned with larger anomaly scores.

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input data.

Returns:

outlier_scores – The anomaly score of shape N.

Return type:

numpy.ndarray

fit(G, y_true=None)[source]#

Fit detector with input data.

Parameters:
  • G (torch_geometric.data.Data) – The input data.

  • y_true (numpy.ndarray, optional) – The optional outlier ground truth labels used to monitor the training progress. They are not used to optimize the unsupervised model. Default: None.

Returns:

self – Fitted estimator.

Return type:

object

get_params(deep=True)#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters:

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

predict(G, return_confidence=False)#

Predict if a particular sample is an outlier or not.

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type:

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters:
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns:

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type:

numpy array of shape (n_samples, n_classes)

process_graph(G)#

Process the raw PyG data object into a tuple of sub data objects needed for the underlying model. For instance, if the training of the model need the node feature and edge index, return (G.x, G.edge_index).

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

processed_data – The necessary information from the raw PyG Data object.

Return type:

tuple of data object

set_params(**params)#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns:

self

Return type:

object