API CheatSheet#

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model:

For the inductive setting:

Input of PyGOD: Please pass in a PyTorch Geometric (PyG) data object. See PyG data processing examples.

See base class definition below:

pygod.models.base module#

Base class for all outlier detector models

class pygod.models.base.BaseDetector(contamination=0.1)[source]#

Bases: object

Abstract class for all outlier detection algorithms.

Parameters:

contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

decision_scores_#

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type:

numpy array of shape (n_samples,)

threshold_#

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type:

float

labels_#

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type:

int, either 0 or 1

decision_function(G)[source]#

Predict raw anomaly scores of PyG Graph G using the fitted detector. The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

anomaly_scores – The anomaly score of The input graph..

Return type:

numpy array of shape (n_samples,)

fit(G)[source]#

Fit detector. y is ignored in unsupervised methods.

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

self – Fitted estimator.

Return type:

object

get_params(deep=True)[source]#

Get parameters for this estimator. See sklearn.base.BaseEstimator for more information.

Parameters:

deep (bool, optional) – If True, will return the parameters for this estimator and contained sub-objects that are estimators. Default: `True`.

Returns:

params – Parameter names mapped to their values.

Return type:

mapping of string to any

predict(G, return_confidence=False)[source]#

Predict if a particular sample is an outlier or not.

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

predict_confidence(G)[source]#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [PVD20].

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

confidence – For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

Return type:

numpy array of shape (n_samples,)

predict_proba(G, method='linear', return_confidence=False)[source]#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [KKSZ11].

Parameters:
  • G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns:

outlier_probability – For each observation, tells whether it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

Return type:

numpy array of shape (n_samples, n_classes)

process_graph(G)[source]#

Process the raw PyG data object into a tuple of sub data objects needed for the underlying model. For instance, if the training of the model need the node feature and edge index, return (G.x, G.edge_index).

Parameters:

G (PyTorch Geometric Data instance (torch_geometric.data.Data)) – The input graph.

Returns:

processed_data – The necessary information from the raw PyG Data object.

Return type:

tuple of data object

set_params(**params)[source]#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. See sklearn.base.BaseEstimator for more information.

Returns:

self

Return type:

object